Managing transactions

Consider corporate scenarios, or businesses generating tons of critical data; operating on them in real time is a responsibility. On one hand, there are corporations such as Twitter or IMDb where the volume of data is high but the criticality of data is not a top priority. However, on the other hand, there are firms that handle high volumes of connected financial or medical data, where maintaining the integrity of data is of the utmost importance. Such scenarios require ACID transactions, which most databases today have built-in support for. Neo4j is a fully ACID database, as we discussed in Chapter 1, Getting Started with Neo4j; it ensures the following properties with respect to transactions:

  • Atomicity: When a part of the transaction is unsuccessful, the state of the database is not changed
  • Consistency: The transaction maintains the database in a consistent state
  • Isolation: When transactions take place, the data being operated upon is not accessible to any other process
  • Durability: It is possible to roll back or recover the committed transaction results

Neo4j has provision to ensure that whenever graph access, indexing, or schema-altering operations take place, they must be processed in transactions. This is implemented with the help of locks. Neo4j allows nonrepeatable reads, in other words, the transactions acquire write-level locks that are only released when the transaction terminates. You can also acquire write locks manually on entities (nodes or relationships) for higher isolation levels such as SERIALIZABLE. The default level is READ_COMMITTED. The core API for a transaction also has provisions to handle deadlocks, which we will discuss later in the chapter.

A transaction is confined at the thread level. You also nest your transactions, where the nested transactions are part of the scope of the highest-level transaction. These transactions are referred to as flat nested transactions. In such transactions, when there is an exception in a nested transaction, the complete highest-level transaction needs to roll back, since alterations of a nested transaction alone cannot be rolled back.

The database constantly monitors the transaction state, which basically involves the following operations:

  1. A transaction begins.
  2. Operations are performed on the database.
  3. Indicate whether the transaction was a success or a failure.
  4. The transaction finishes.

The transaction must finish in order to release the acquired locks and the memory used. In Neo4j, we use a try-finally code segment where the transaction is started and the write operations are performed. The try block should end by marking the transaction successful and the transaction should be finished by the finally block, where the commit or rollback operation is performed depending upon the success status of the transaction. It is important to keep in mind that any alterations performed in a transaction are in memory, which is why for high-volume scenarios with frequent transactions, we need to divide the updates into multiple higher- or top-level transactions to prevent the shortage of memory:

Transaction tx = graphDb.beginTx();
try
 {
     // operations on the graph
     // ...
 
     tx.success();
 }
finally
 {
     tx.close();
 }

Since transactions operate with thread pools, other errors might be occurring when a transaction experiences a failure. When a transaction thread has not finished properly, it is not terminated and marked for rollback and will result in errors when a write operation is attempted for that transaction. When performing a read operation, the previous value committed will be read, unless the transaction that is currently being processed makes changes just before the read. By default, the level of isolation implemented is READ_COMMITTED, which means that no locks are imposed on read operations, and hence, the read operations can occur in a nonrepeatable fashion. If you manually specify the read and write locks to be used, then you can implement a higher level of isolation, namely, SERIALIZABLE or REPEATABLE_READ. Generally, write locks are implemented when you create, modify, or delete a particular entity as outlined in the following points:

  • Writelock a node or relationship when you add, change, or remove properties.
  • The creation and deletion of nodes and relationships require you to implement a write lock. For relationships, the two connecting nodes need to be write-locked as well.

Neo4j comes equipped with deadlock detection, where a deadlock occurring due to the locking arrangement can be detected before it happens and Neo4j churns out an exception to indicate the deadlock. Also, the transaction is flagged to be rolled back before the exception is thrown. When the locks held are released in the finally block, other transaction operations that were busy waiting on the resource can now take up the lock and proceed. However, the user can choose to retry the failed/deadlocked transaction at a later time.

Deadlock handling

When deadlocks occur frequently, it is generally an indication that the concurrent write requests are not possible to execute to maintain consistency and isolation. To avoid such scenarios, concurrent write updates must be executed in a reasonable fashion. For example, deadlocks can happen when we randomly create or delete relationships between the two given nodes. The solution is to always execute the updates in a specific order (first on node 1 and then on node 2 always) or by manually ensuring that there are no conflicting operations in the concurrent threads by confining similar types of operations to a single thread.

All tasks performed by the Neo4j API are thread-safe in nature, unless you explicitly specify otherwise. So, any other synchronized blocks in your code should not include operations relating to Neo4j. There is a special case that Neo4j includes while deleting nodes or relationships. If you try to delete a node or relationship completely, the properties will undergo deletion, while the relationships will be spared. What? Why? That's because Neo4j imposes a constraint on relationships that have valid start and end nodes. So, if you try to delete nodes that are still connected by relationships, an exception is raised on committing transactions. So, the transaction must be planned in such a way that no relationships to a node being deleted must exist when the current transaction is about to be committed. The semantic conventions that must be followed when a delete operation is required to be performed are summarized as follows:

  • When you delete a node or relationship, all properties are deleted.
  • Before committing, a node must not have relationships attached to it.
  • A node or relationship is not actually deleted unless a commit takes place; hence, you can reference a deleted entity before commits. However, you cannot write to such a reference.

Uniqueness of entities

Duplication is another issue to deal with when multithreaded operations are in play. It is possible that there is only one player with a given name in the world, but transactions on concurrent threads trying to create such a node can end up creating duplicated entities. Such operations need to be prevented. One naïve approach would be to use a single thread to create the particular entities. Another popular approach that is used most often is to use the get_or_create operation. We can guarantee uniqueness with the help of indexing where legacy indices are used as locks for the smallest unique identity of the entity to enable creation only if the lookup for that particular entity fails. The other existing one is simply returned. This concept of get_or_create exists for Cypher as well as the Java API. This ensures uniqueness across all transactions and threads.

There is also a third technique called pessimistic locking that is implemented across common nodes or a single node, where a lock is manually created and used to check for synchronization. However, this approach does not apply to a high-availability scenario.

Events for transactions

Event handlers for transactions keep track of what happens in the course of a transaction before it goes for a commit. You need to register an event handler to an instance of the GraphDatabaseService, events can be received. Handlers are not notified if the transaction does not perform any writes or the transaction fails to commit. There are two methods, beforeCommit and afterCommit, that calculate the changes in the data (the difference) due to that commit and that constitutes an event.

Let's now see a simple example where a transaction is executed through the Java API to see how the components fit together:

public void transactionDemo() {
    
    GraphDatabaseService graphDatabase;
    Node node1;
    Node node2;
    Relationship rel;

    graphDatabase = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
    registerShutdownHook( graphDatabase );
    
    Transaction txn = graph.beginTx();
    try {
        node1 = graphDatabase.createNode();
        node1.setProperty( "name", "David Tennant" );
        node2 = graphDatabase.createNode();
        node2.setProperty( "name", "Matt Smith" );

        rel = node1.createRelationshipTo( node2, RelTypes.KNOWS );
        rel.setProperty( "name", "precedes " );

        node1.getSingleRelationship( RelTypes.KNOWS, Direction.OUTGOING ).delete();
        node1.delete();
        node2.delete();

        txn.success();
    } catch (Exception e) {
        txn.failure();
    } finally {
        txn.finish();
    }
}

When you are using the Neo4j REST server or operating in the high-availability mode, then the following syntax can be used:

POST http://localhost:7474/db/data/transaction/commit
Accept: application/json; charset=UTF-8
Content-Type: application/json
{
  "statements" : [ {
    "statement" : "CREATE (n {props}) RETURN n",
    "parameters" : {
      "props" : {
        "name" : "My Node"
      }
    }
  } ]
}

The preceding REST request begins a transaction and commits it after completion. If you want to keep the transaction open for more requests, then you need to drop the commit option from the POST request as follows:

POST http://localhost:7474/db/data/transaction

Post this at the end of the transaction to commit:

POST http://localhost:7474/db/data/transaction/9/commit

Transactions are the core components that make Neo4j ACID-compliant and suitable for use in scenarios where high volumes of complex critical data are being used. Transactions, if managed efficiently, can make your application robust and consistent, even in scenarios that require real-time updates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset