Design constraints in Neo4j

Neo4j is very versatile in terms of data structuring, but everything has its limitations. So, a designer ought to know where Neo4j gives up so that he can weigh his data size and type and boil down to an efficient data model.

Size of files: Neo4j is based on Java at its core, so its file handling is dependent upon the nonblocking input/output system. Although there are several optimizations for interconnected data in the layout of storage files, there are no requirements of raw devices in Neo4j. Hence, the limitation on the file sizes is determined by the core operating system's ability to handle large files. There is no such built-in limit that makes it adaptable to big data scenarios. There is, however, a process of internal memory-mapping of the underlying file storage to the maximum extent. The beauty also lies in the fact that in systems where memory gradually becomes a constraint and the system is unable to keep all data in memory, Neo4j will make use of buffers that will dynamically reallocate the memory-mapped input/output windows to memory areas where most activity takes place. This helps to maintain the speed of ACID interactions.

Data read speed: Most organizations scale and optimize their hardware to deliver higher business value from already-existing resources. Neo4j's techniques of data reads provide efficient use of all available system hardware. In Neo4j, there is no blockage or locking of any read operation; hence, deadlocks need not be worried about and transactions are not required for reads. Neo4j implements a threaded access to the database for reads, so you can run simultaneous queries to the extent supported by your underlying system and on all available processors. For larger servers, this provides great scale-up options.

Data write speeds: Optimizing the write speed is something most organizations worry about. Writes occur in two different scenarios:

  • Write in a continuous track in a sustained way
  • Bulk write operations for initial data loads, batch processes, or backups

In order to facilitate writes in both these scenarios, Neo4j inherently has two modes of writing to the underlying layer of storage. In the normal ACID transactional operations, it maintains isolation, that is, reads can occur in the duration of the write process. When the commit is executed, Neo4j persists the data to the disk, and if the system fails, a recovery to the consistent state can be obtained. This requires access for writes to the disk and the data to be flushed to the disk. Hence, the limitation for writes on each machine is the I/O speed of the bus. In the case of deployment to production scenarios, high-speed flash SSDs are the recommended storage devices, and yes, Neo4j is flash ready.

Neo4j also comes with a batch inserter that can be used to directly work on the stored data files. This mode does not guarantee security for transactions, so they cannot be used in a multithreaded write scenario. The write process is sequential in nature on a single write thread without flushing to logs; hence, the system has great boosts to performance. The batch inserter is handy for the import of large datasets in nontransactional scenarios.

Size of data: For data in Neo4j, the limitation is the size of the address space for all the keys used as primary keys in lookup for nodes, relationships, and their properties and types. The address space at present is as follows:

Entity

Address space size

Nodes

235 (about 34 billion)

Relationships

235 (about 34 billion)

Relationship types

215 (about 32,000)

Properties

236 to 238 according to the type of the property (up to about 274 billion, but will always be a minimum of about 68 billion)

Security: There might arise scenarios where unauthorized access to the data in terms of modification and theft needs to be prevented. Neo4j has no explicitly supported data encryption methods but can use Java's core encryption constructs to secure the data before storing in the system. Security can also be ensured at the level of the file system using an encrypted data storage layer. Hence, security should be ensured at all levels of the hosted system to avoid malicious read-writes, data corruption, and Distributed denial of service (DDOS) attacks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset