Storage for properties

The property storage of Neo4j has seen several upgrades and improvements with recent releases, which has made it more usable and stable, while optimizing the layer to use lesser disk space without compromising on features, and improving the speed of operations.

The storage structure

Originally, the Neo4j property store was in the form of a doubly linked list, in which the nodes contained additional information about the structure, along with the property-related data. The node structure then is represented in the following format:

Byte(s)

Information

0

The 4 high bits of the previous pointer and inUse flag

1

unused

2

The 4 high bits of next pointer

3-4

The type of property

5-8

The index of property

9-12

32 low bits of the previous pointer

13-16

32 low bits of the next pointer

17-24

Data for the property

So, the 8 bytes at the end were used to store the value and were sufficient to hold all primitive types, small strings, or pointer references to the dynamic store where long strings or arrays are stored. This, however, has a redundancy, since the complete 8 bytes are only utilized when the data stored is long, double, or string and references are repeated for each property. This causes significant overhead. So, in the newer versions of Neo4j, this was optimized by designing PropertyRecord, which, instead of housing a single property, now emulates a container to incorporate a number of properties of variable lengths. You can get a clearer outline of the present structure here:

Byte(s)

Information

0

4 high bits of the previous pointer and 4 high bits of the next pointer

1-4

The previous property record

5–8

The next property record

9–40

Payload

As you can see, the inUse flag has been done away with and the space is optimized to include the payload more efficiently. There are 4 blocks of 8 bytes in the payload, each of which are used in a different manner depending upon the type of data stored in the property. The type and index of the property are necessary and always required and hence occupy the first 3 bytes of the blocks and 4 high bits of the 4th byte, respectively. The value of the property is dependent on the data type being stored. If you are using a primitive that can be accommodated in 4 bytes, then the 4th byte's lower 4 bits are skipped and the remainder of the 4 bytes are used to store the value. However, when you are storing arrays and nonshort strings using DynamicStore, you need 36 bits for the storage of the reference that uses the total lower 4 bits of the 4th byte and the remaining 4 bytes. These provisions for 4 properties are stored in the same PropertyRecord, thereby increasing the space efficiency of the storage. However, if doubles and longs are what you intend to store, then the remaining 36 bits are skipped over and the subsequent block is used to store the value. This causes unnecessary space wastage, but its occurrence is rare and overall more efficient than the original storage structure.

LongerShortString is a newly introduced type, which is an extension of the ShortString operating principle, in which a string is initially scanned to figure out whether it falls within an encoding. If it does, then the encoding is performed and a header is stored for it that contains the length of the string, the ID in the encoding table, and finally the original string. However, the UTF8 encoding scheme is used when the three and a half blocks of the property block are insufficient for storage and DynamicStringStore is used. In the case of an array, we first determine the smallest number of bits that can be used to store the values in it and, in the process, we drop the leading zeroes and maintain the same length for each element. For example, when given the array [5,4,3,2,1], each element does not take a separate set of 32 bits; rather, they are stored in 3 bits each. Similarly, only a single bit is used to store Boolean-valued array elements. In the case of dynamic arrays, a similar concept is used. So, such a data value is stored in the following format:

Number of Bits

Stored Information

4

Enumeration storing type of entity

6

The array length

6

Bits used for each item

The remaining 16

Data elements

There is one secret we are yet to explore: the inUse flag. It is, in fact, a marker to indicate whether a property exists and is in use or not. Each block is marked to distinguish whether it is in use and, since we are allowed to delete properties, a zero signifies that the current property is not in use or deleted. Moreover, the blocks are stored onto the disk in a defragmented manner. So, if some property from a set of properties is deleted, only the remaining two are written to disk, and the deleted property's 4th byte of the first block that is not used is marked as a zero, which indicates that it is actually not used. If you take some time to explore the source code of the Neo4j project, you will find these implementation details and strategies in WriteTransaction.

Migrating to the new storage

In a rare case, if you are dealing with an older version of Neo4j and considering an upgrade to the newer architecture, it cannot happen without the need to change, remove, or replace the existing data. You will need to recreate the existing database. This ensures that existing files are not overwritten, guarantees crash-resistance, and also backs up data. This process is relatively simple: read all nodes, relationships, and properties for them and then convert them to the new format before storing them. In the migration process, the size is significantly reduced as the deleted entities are omitted, which is noticeable if a lot of deletions have been performed on the database and it is not restarted often. The code and logic for migration is included in the source code of the Neo4j kernel in the org.neo4j.kernel.impl.storemigration package, which you can run both as part of a generic startup or in a standalone manner. You will be required to set "allow_store_upgrade"="true" in your config and then you can successfully execute the migration scripts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset