Transactions

Transactions are an integral part of the Neo4j ecosystem and primarily dominated by the use of two major components—the Write-Ahead Log (WAL) and the Wait-For Graph (WFG) for detection of deadlocks prior to their occurrence.

The Write Ahead log

The WAL in the Neo4j transaction system ensures atomicity and durability in transactions. Every change during a transaction persists on disk as and when the request for the change is received, without modifying the database content. When the transaction is committed, the changes are made to the data store and subsequently removed from the disk. So, when the system fails during a commit, the transactions can be read back and database changes can be ensured. This guarantees atomic changes and durability of commit operations.


All the changes during transactions occur in states. On initiating a transaction (with the beginTx() method), a state called TX_STARTED is assigned to it. There are similar states assigned while preparing a transaction, committing it, and rolling it back. The changes stored during transactions are called commands. Every operation performed on the database including creation and deletion corresponds to a command and, collectively, they define what transactions are. In Neo4j, the WAL is implemented with the help of XaLogicalLog defined in org.neo4j.kernel.impl.transaction.xaframework in the source, which aids in the management of intermediate files for storage of commands during a transaction. The LogEntry class provides an abstraction over the way in which XaLogicalLog stores its information, which contains information for phases and stored commands of a transaction. So, whenever the transaction manager (txManager) indicates the change of a phase in a given transaction, or the addition of a command, it flags XaLogicalLog, which writes an appropriate entry to the file.

Basically, files are used to store transaction logs in the root directory of the database. The first file, nioneo_logical.log.active, is simply a marker that indicates which of the log files is currently active. The remaining are the active log files that follow the naming convention nioneo_logical.log.1 or nioneo_logical.log.2; only one of them is active at a given time and read and written to with the help of a memory buffer or heap as defined in the use_memory_mapped_buffers configuration parameter. Neo4j also has an option to maintain backup files in a versioned manner through the configuration parameter keep_logical_logs. They use the nioneo_logical.log.v<version_no> format to store the file. What logically happens is if you are set to store backups, your log files are not deleted after the transaction; instead, they are renamed to a backup file.

The logical log entries have an integral identifier for the transaction, assigned to them by XaLogicalLog. It also maintains xidIdentMap, which maps the identifier to the LogEntry.Start state in order to reference active transactions. Now it is evident that write operations are appended to the log after the file offset of the start entry. You can obtain all information about the transaction after the offset. So we can optimize the lookup time and store the offset of the Start entry along with xidIdentMap corresponding to the identifier for that transaction; we no longer need to scan the log file for the offset of the transaction and directly go to the indicated start of transaction. The LogEntry.Prepare state is achieved when the current transaction is being prepped for a commit. When the process of a transactional commit has been initiated, the state written can be LogEntry.OnePhaseCommit or LogEntry.TwoPhaseCommit, depending on whether we are writing to EmbeddedGraphDatabase or a distributed scenario (generally using a JTA/JTS service), respectively. When a transaction is completed and is no longer needed to exist in an active state, the LogEntry.Done state is written. At this state, the identifier to the start state is also removed from the map (xidIdentMap) where it was stored. LogEntry.Command is not a state as such, but a method for encapsulation of the transaction commands. The writeCommand() of XaLogicalLog takes in a command as an argument and writes it to disk.

The LogEntry state

Operation for trigger

Start

This indicates that the transaction is now active

Prepare

This indicates that the transaction is being prepped for a commit

OnePhaseCommit

This initiates a commit in EmbeddedGraphDatabase

TwoPhaseCommit

This initiates commits in a distributed scenario

Done

This indicates that a transaction is complete

Command (not an actual state)

Encapsulation for the commands in the transaction

So, all the state changes of a transaction are stored in the form of LogEntry that contains the state indicator flags and transaction identifier. No deletions occur whatsoever. Writing a Done state indicates that the transaction has passed. Also, the commands causing the state change are also persisted to disk.

We mentioned that all commands are appended with no deletions and the storage to disk can create massive files for large transactions. Well, that's where the concept of log rotation comes in, which is triggered once the size of the log file exceeds a threshold (the default value is 10 MB). The rotate() method of XaLogicalLog is invoked when the log file size exceeds the threshold during the appending of a command and there is no live transaction taking up any space greater than 5 MB. The rotate() function performs the following:

  1. Checks the currently used log file from the .active file, which stores the reference.
  2. Writes the content of the buffer for the log file to disk and creates the new log file with the version and identifier of the last committed transaction in the header.
  3. Initiates reading of entries from the offset of Start. All LogEntries that belong to the current transaction are copied to the new log file and offset is updated accordingly.
  4. Disposes of the previous log file and updates the reference to the new log file in the .active file.

All the operations are synchronized, which pauses all updating transactions till the rotate operations are over.

How does all this facilitate recovery? When termination of XaLogicalLog occurs, if the map is empty and no transactions are live, the .active file stores a marker that indicates the closure of the transaction, and the log files are removed. So, when a restart occurs, and the .active file is in the "nonclean" (or not closed) mode, it means that there are transactions pending. In this case, the last active log file is found from the .active file and the doInternalRecovery() method of XaLogicalLog is started. The dangling transactions are recreated and the transaction is reattempted.

The setRecovered() method is used to indicate that a transaction has been successfully recovered, which avoids its re-entry into the WAL during subsequent recovery processes.

Detecting deadlocks

Neo4j, being an ACID database solution, needs to ensure that a transaction is completed (whether successfully or unsuccessfully), thereby stopping all active threads and avoiding deadlocks in the process. The core components that provide this functionality include RWLock (Read Write Lock), LockManager, and RagManager (Resource Allocation Graph Manager).

RWLock

RWLock provides an implementation of the Java ReentrantReadWriteLock for Neo4j, which allows concurrency in reading but single-threaded, exclusive write access to the locked resource. Being re-entrant in nature, it facilitates the holder of the lock to re-acquire the lock again. The lock also uses RagManager to detect whether waiting on a resource can lead to possible future deadlocks. Essentially, RWLock maintains a reference to the locking resources, that is, the threads and counts for read and write locks. If a request for the read lock is processed by some thread, it checks whether writes locks exist; if they do, then they should be held by the calling resource itself which, when true, make sure the lock is granted. Otherwise, RagManager is used to detect whether the thread can be allowed to wait without a deadlock scenario. Write locks are handled in a similar fashion. To release locks, the counts are reduced and waiting threads are invoked in a FIFO manner.

RAGManager

RAGManager operates with primarily the checkWaitOn() and checkWaitOnRecursive() utility methods. It is informed of all acquired and released locks on resources. Before invoking wait() on a thread, RWLock gets possible deadlock information from RAGManager. It is essentially a WFG that stores a graph of the resources and threads waiting on them. The checkWaitOn() method traverses the WFG to find whether a back edge exists to the candidate that needs a lock, in which case, a DeadlockDetectedException exception is raised, which terminates the thread. This leads to an assertion that the transaction will not complete, thereby enforcing atomicity. So, loops are avoided in a transaction.

LockManager

The sole purpose and existence of the LockManager class is the synchronization of RWLock accesses, or creation of the locks and, whenever required, passing an instance of the RAGManager and appropriate removal at times. At a high level of abstraction, Neo4j uses this class for the purpose of locking.

The scheme of locks and detection of deadlock simply ensures that the graph primitives are not granted locks in an order that can lead to a deadlock. It, however, does not protect you from the application-code-level deadlocks arising when you write multithreaded applications.

The XaTransaction class in Neo4j that is the central authority in the transactional behavior is XaTransaction. For any transaction that deals with a Neo4j resource, the fundamental structure is defined by this class, which deals with the holding of XaLogicalLog to persist the state of the transaction, its operations, and storage of the transaction identifier. It also includes the addCommand() method, which is used for normal transactional operations, and the injectCommand() method, which is used at the time of recovery. The core class in Neo4j, which implements WriteTransaction transactions extends the XaTransaction class, thereby exposing the extended interface. Two types of fields are dealt with here:

  • Records: This stores an integer to record a map for a particular primitive, where the integer is the ID of the primitive
  • Commands: These are stored in the form of command object lists

In the course of normal operations, the actions performed on a Neo4j primitive are stored in the form of record objects in the store. As per the operation, the record is modified and placed in its appropriate map. In the prepare stage, the records are transformed into commands and are put in the corresponding Command Object list. At this point, an apt LogEntry.Command state is written to XaLogicalLog. When doCommit() is invoked, the commands are executed individually, which releases the locks held and finally the commands are cleared. If a request for doRollback() is received, the records in the map are checked. If it has been flagged as created, the record's ID is freed by the underlying store and, subsequently, the command and record collections are cleared. So, if a transaction results in failure, an implicit rollback is initiated and injectCommand() is used to directly add the commands in the commands list prior to the next commit operation. The IDs that are not yet freed are recovered from IdGenerator of the underlying storage as and when the database is restarted.

Commands

The command class extends XaCommand to be used in NeoStore. The command class defines a way for storage in LogBuffer, reading back from it followed by execution. NodeCommand is treated differently from RelationshipCommand and likewise for every primitive. From the operations perspective, NodeCommand in command has two essential components to work with: NodeRecord, which stores the changes that need to be performed on the store and NodeStore, which persists the changes. When execution is initiated, the store is asked to perform updates on NodeRecord. To persist the command to disk, the writeToFile() method is used, which sets a marker to the entry and writes the record fields. In order to read it back, the readCommand() method is invoked, which restructures NodeCommand. Other primitive command types follow the same procedure of operation:

  • TransactionImpl
  • TxManager
  • TxLog

We have seen that transactions can be implemented over NeoStore. Likewise, there can also be transactions over a Lucene Index. All these transactions can be committed. However, since the transactions between indexes and primitives are connected, if WriteTransaction results in failure, then LuceneTransaction must also fail and vice versa. The TransactionalImpl class takes care of this. Resources are added to TransactionImpl with the help of enlistResource(), which are bound together with the help of the TwoPhaseCommit (2PC) protocol and, when a commit operation is requested, all the enlisted resources are asked to get prepared and they return a status of whether the changes succeeded or not. When all return an OK, they proceed with the commit; otherwise, a rollback is initiated. Also, each time a transaction status change occurs, a notification is sent to TxManager and the corresponding record is added to txLog. This WAL is used for failure recovery. The identifier for TransactionImpl is called globalId and every resource that enlists with it is assigned a corresponding branchId, which are bound together as an abstraction called Xid. So, when a resource is enlisted for the first time, it calls TxManager and writes a record to txLog that marks the initiation of a new transaction. In the case of a failure, when a transaction needs to be reconstructed from the log, we can associate the transaction with the resources that were being managed by it.

TxManager abstracts the TransactionalImpl object from the program. On starting a new transaction, a TransactionalImpl object is created and mapped to the thread currently in execution. All methods in TxManager (except the resume method) automatically receive the TransactionImpl object. TxLog works in a similar fashion as XaLogicalLog with regard to the writing of entries and the rotation of files.

So, if your system crashes during the execution phase of commands in a transaction, without a rollback, then what happens? In such a situation, the complete transaction is replayed and the commands that were already executed before the failure would be processed again.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset