Admin tricks

A database is the life force behind an application, and that calls for a high degree of initial optimization depending upon the type and size of data to be stored and resources available on the server. Being written in Java, Neo4j also requires you to configure the Java related parameters properly as well. In the upcoming sections, we will look at how you can tweak your system and database configurations to maintain your Neo4j data store in good health.

Server configuration

For advanced usage of the Neo4j database, you can configure several parameters to keep the resources in check. The primary configuration file, Neo4j, is located in the conf/neo4j-server.properties directory. For normal development purposes, the default settings are sufficient. However, as an administrator, you can make suitable changes to the settings.

You can set the base directory on the disk where your database resides using the following property:

org.neo4j.server.database.location=/path/to/database/graph.db

The default port on which Neo4j operates is 7474. However, you can change the port for accessing the data, UI and administrative use, using the following setting:

org.neo4j.server.webserver.port=9098

You can even configure the client access pattern depending upon the address of the Neo4j database relative to the application that uses it. This helps in restricting the use of the database to the specific application. The default value is the loopback 127.0.0.1, which can be changed with:

#allowonly client's IP to connect
org.neo4j.server.webserver.address=192.168.0.2

#any client allowed to connect
org.neo4j.server.webserver.address=0.0.0.0

You can set the rrdb (round robin database directory) for collecting the metrics on the instance of database running. You can even specify to the database the URI path to be used for accessing the database with the REST API (it is a relative path and the default value is /db/data). The following settings are used:

org.neo4j.server.webadmin.rrdb.location=data/graph.db/../rrd

org.neo4j.server.webadmin.data.uri=/db/data/

The Neo4j WebAdmin interface uses a different relative path to provide access to the management tool. You can specify the URI setting as follows:

org.neo4j.server.webadmin.management.uri=/db/manage

If the Neo4j database resides on a separate machine in the network, you can restrict the class of network addresses that can access it (IPv4 or IPv6 or both). You need to modify the settings in the conf/neo4j-wrapper.conf file. Look for the section titled Java Additional Parameters, and append the following parameter to it:

wrapper.java.additional.3=-Djava.net.preferIPv4Stack=true

In order to configure the number of threads controlling the concurrency level in the servicing of the HTTP requests by the Neo4j server, you can use the following parameter:

org.neo4j.server.webserver.maxthreads=200

A timeout is used by the Neo4j server to manage orphaned or broken transactions. So, if no requests are received for an open transaction for a period configured in the timeout (the default is 60s), the transaction is rolled back by the server. You can configure it as:

org.neo4j.server.transaction.timeout=60

The main file for server configurations is conf/neo4j-server.properties. For parameters to tune the performance of the database at a low level, a second file called the neo4j.proprties file is used. You can explicitly set this file using the org.neo4j.server.db.tuning.properties=neo4j.properties parameter which, if not set, the server looks for in the current directory as the neo4j-server.properties file. If no file is present, then a warning is logged by the server. When the neo4j.properties file is set and the server is restarted, this file is loaded and the database engine is configured accordingly.

JVM configurations

Neo4j is written in Java and hence, the settings for JVM also decide the resource constraints that are imposed upon the database. You can however, configure these properties in the conf/neo4j-wrapper.conf file in NEO4J_HOME in your installation. Here are a few common properties that you can tweak according to your requirements:

Name of property

What it stands for

wrapper.java.initmemory

Initial size of heap (in MB)

wrapper.java.maxmemory

Maximum size of heap (in MB)

wrapper.java.additional.N

Additional literal parameter of the JVM (N is the number of each literal)

The underlying JVM has two parameters that are used to control the main memory – one each for the stack and the heap. In Neo4j, the heap size is a critical parameter, since it controls the allocation of objects (number of objects) by the database engine. The stack, on the other hand, is the deciding factor for the depth of the call stack for the application.

Generally, the notion is that having a large heap size is better. With a large heap, Neo4j can handle transactions that are much larger, and also experience high concurrency in transactions. Neo4j speed will also increase as a bigger section of the graph and will fit in the caches, leading to more frequently used nodes and relationships being quickly accessible. Also, with a larger heap, the nodes and relationship caches will be much larger as well.

However, as an admin, you need to make sure that the heap fits in the system's main memory, because if paging to disk occurs, then the performance is adversely affected. Also, if your heap size is much larger than the requirement of the application, then the JVM garbage collection leaves dead objects lying around for a longer time. This, in turn, will cause longer pauses for garbage collection, and latency issues which is not desired by the application. In a 32 bit JVM, the default heap size is 64 megabytes, which is too small for practical applications (a 64-bit JVM heap is not useful either). Memory is a critical factor when transactions are prominent in the system. The following figure shows the memory footprints of different transaction types:

JVM configurations

Depending on the cache implementation being used, a suitable heap size coupled with garbage collection can be used to handle most traffic by the database. The default soft reference cache (LRU based) needs a heap larger than the data to be kept in it, thereby being able to cache most nodes and relationships. It will let the heap get too full, then it will trigger a garbage collection which will result in loss of cached data. The cache storage can be prolonged by using a much larger cache. If a strong reference cache is being used, then the entire graph must fit in the heap (cache). Thus large heaps can avoid out-of-memory exceptions and maintain high overall throughput.

A weak reference cache, on the other hand, can be allocated heap, just enough for handling the peak load (average memory x peak load) and is beneficial in low latency scenarios.

Number of primitives

RAM size

Heap configuration

Reserved RAM for the OS

10M

2GB

512MB

The rest

100M

8GB+

1-4GB

1-2GB

1B+

16GB-32GB+

4GB+

1-2GB

Caches

Caches in Neo4j are of two basic types:

  • File buffer cache: It is used to cache the storage file data as it is stored on the storage media
  • Object cache: It is used for caching of nodes, relationships and properties to be used for speeding traversals, and transactions

The Neo4j data is stored in the file buffer cache in a format identical to that used for the representation of a persistent storage medium. This cache is helpful in improving the read/write performance by writing to cache, and delaying writing to the persistent storage till the rotation of the logical log. It is a safe operation since all stored files for the transaction can be recovered in case a crash occurs.

Let us take a look at how data is stored in Neo4j and the files used for storage onto the underlying file system. Each file in the Neo4j storage file stores uniform records of a fixed size and a specific type:

Store file

Record size

Contents

neostore.nodestore.db

15 B

Nodes

neostore.relationshipstore.db

34 B

Relationships

neostore.propertystore.db

41 B

Properties for nodes and relationships

neostore.propertystore.db.strings

128 B

Values of string properties

neostore.propertystore.db.arrays

128 B

Values of array properties

You can configure the size of the records during the creation of the data store with the help of the array_block_size and the string_block_size parameters. These settings come in handy when you expect to store large data records in the entities. Another advantage of these records is that you can estimate the storage requirements of the data in the graph, and calculate a rough cache size for the file buffer caches.

A file buffer cache exists for every distinct storage file. The file is divided by the cache into multiple equal-sized windows containing even numbers of records. In the process of caching, the windows that are most active are held in memory and the hit/miss ratio for each window is constantly tracked. When the ratio for a window that is uncached is found to be greater than those in the cache, one window from the cache is removed and is replaced by this window.

The object cache is used to cache nodes, relationships, and properties to optimize them for speedy graph traversals. Reading from the object cache experiences five to ten times the speed of accessing a file buffer cache. As soon as a node or relationship is accessed, it is added to the object cache. However, populations of the cached objects occur lazily. Loading of the properties only occur when the property is accessed. If a node is loaded into the cache, its relationships are not loaded until they are accessed.

You can configure the object cache using the cache_type parameter to specify the type of cache implementation to be used, mentioned separately for nodes and relationships. The available options for cache types are:

Cache type

Description

none

No high level cache is used. Object caching does not take place.

soft

Uses available memory optimally, and useful in high performing traversals. If cache size is inadequate for frequently used parts, garbage collector issues may occur. The community edition of Neo4j has this as the default cache type.

weak

It provides relatively short life spans for cached objects. For applications requiring high throughput, and where the frequently accessed section of the graph cannot fully fit into memory, this is a suitable solution.

strong

Best option for small completely in-memory graphs. This technique loads all data into memory without any removals or releases.

hpc

This refers to the high-performance cache. It dedicates memory chunks for caching nodes and relationships and is the best option in most scenarios. It facilitates fast lookups/writes and has a very small footprint. This cache type is available and is the default option for the Enterprise edition of Neo4j.

Apart from the cache_type parameter, there are a few other parameters that can be used to configure the way caches operate in Neo4j, and their resource constraints. Some of the important parameters are listed as follows:

Configuration option

Description (what it controls)

Example value

cache.memory_ratio

The percent of the available memory that will be used for caching. The default is 50 percent.

60.0

node_cache_array_fraction

The dedicated fraction of the heap size for the cache array for nodes (max ten).

8

relationship_cache_array_fraction

The dedicated fraction of the heap size for the cache array for relationships (max ten).

7

node_cache_size

The maximum amount of heap memory dedicated for caching nodes.

3G

relationship_cache_size

The maximum amount of the heap memory dedicated to caching relationships.

800M

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset