Memory and cache

Exclusive offer: get 50% off this eBook here
Getting Started with OrientDB

Getting Started with OrientDB — Save 50%

A practical guide to learn, deploy, and customize OrientDB with this book and ebook

$20.99    $10.50
by Claudio Tesoriero | September 2013 | Java Open Source

This article by Claudio Tesoriero, the author of Getting Started with OrientDB has discussed about the memory and cache.

OrientDB uses more than one cache: one for each opened connection and one shared among the connections. Furthermore, OrientDB uses the memory mapped files to speed up the data access. This means that the trick here is to find the right balance among the caches, the memory mapping and the heap memory used by the JVM. To set up the memory that will be used by the memory mapped files, you can use the file.mmap.maxMemory configuration property. For example, on a 32-bit machine the maximum memory addressable is 4 GB, which means that you can set the heap value and the virtual memory so that their sum is 4 GB. Keep in mind, however, that if your server does not have enough memory, OrientDB can be swapped by O.S. and you can experience some performance degradation. In a 64-bit architecture, by default, OrientDB automatically set the file.mmap.maxMemory value as:

(maxOsMemory - maxProcessMemory) / 2

(For more resources related to this topic, see here.)

You can find this instruction in the OGlobalConfiguration.java file in the autoConfig() method. Furthermore, you can enable/disable level 1 cache, level 2 cache, or both. You can also set the number of records that will be stored in each level as follows:

  • cache.level1.size: This sets the number of records to be stored in the level 1 caches (default -1, no limit)
  • cache.level2.size: This sets the number of records to be stored in the level 2 cache (default -1, no limit)
  • cache.level1.enabled: This is a boolean value, it enables/disables the level 1 cache (default, true)
  • cache.level2.enabled: This is a boolean value, it enables/disables the level 2 cache (default, true)

Mapping files

OrientDB uses NIO to map data files in memory. However, you can change the way this mapping is performed. This is achieved by modifying the file access strategy.

  • Mode 0: It uses the memory mapping for all the operations.
  • Mode 1 (default): It uses the memory mapping, but new reads are performed only if there is enough memory, otherwise the regular Java NIO file read/write is used.
  • Mode 2: It uses the memory mapping only if the data has been previously loaded.
  • Mode 3: It uses memory mapping until there is space available, then use regular JAVA NIO file read/write.
  • Mode 4: It disables all the memory mapping techniques.

To set the strategy mode, you must use the file.mmap.strategy configuration property.

Connections

When you have to connect with a remote database you have some options to improve your application performance. You can use the connection pools, and define the timeout value to acquire a new connection. The pool has two attributes:

  • minPool: It is the minimum number of opened connections
  • maxPool: It is the maximum number of opened connections

When the first connection is requested to the pool, a number of connections corresponding to the minPool attribute are opened against the server. If a thread requires a new connection, the requests are satisfied by using a connection from the pool. If all the connections are busy, a new one is created until the value of maxPool is reached. Then the thread will wait, so that a connection is freed. Minimum and maximum connections are defined by using the client.channel.minPool (default value 1) and client.channel.maxPool (default value 5) properties. However, you can override these values in the client code by using the setProperty() method of the connection class. For example:

database = new ODatabaseDocumentTx("remote:localhost/demo");
database.setProperty("minPool", 10);
database.setProperty("maxPool", 50);
database.open("admin", "admin");

You can also change the connection timeout values. In fact, you may experience some problem, if there are network latencies or if some server-side operations require more time to be performed. Generally these kinds of problems are shown in the logfile with warnings:

WARNING: Connection re-acquired transparently after XXXms and Y
retries: no errors will be thrown at application level

You can try to change the network.lockTimeout and the network.socketTimeout values. The first one indicates the timeout in milliseconds to acquire a lock against a channel (default is 15000), the second one indicates the TCP/IP socket timeout in milliseconds (default is 10000). There are some other properties you can try to modify to resolve network issues. These are as follows:

  • network.socketBufferSize: This is the TCP/IP socket buffer size in bytes (default 32 KB)
  • network.retry: This indicates the number of retries a client should do to establish a connection against a server (default is 5)
  • network.retryDelay: This indicates the number of milliseconds a client will wait before retrying to establish a new connection (default is 500)

Transactions

If your primary objective is the performance, avoid using transactions. However, if it is very important for you to have transactions to group operations, you can increase overall performance by disabling the transaction log. To do so just set the tx.useLog property to false.

If you disable the transaction log, OrientDB cannot rollback operations in case JVM crashes.

Other transaction parameters are as follows:

  • tx.log.synch: It is a Boolean value. If set, OrientDB executes a synch against the filesystem for each transaction log entry. This slows down the transactions, but provides reliability on non- reliable devices. Default value is false.
  • tx.commit.synch: It is a Boolean value. If set, it performs a storage synch after a commit. Default value is true.

Massive insertions

If you want to do a massive insertion, there are some tricks to speed up the operation. First of all, do it via Java API. This is the fastest way to communicate with OrientDB. Second, instruct the server about your intention:

db.declareIntent( new OIntentMassiveInsert() );
//your code here....
db.declareIntent( null );

Here db is an opened database connection.

By declaring the OIntentMassiveInsert() intent, you are instructing OrientDB to reconfigure itself (that is, it applies a set of preconfigured configuration values) because a massive insert operation will begin. During the massive insert, avoid creating a new ODocument instance for each record to insert. On the contrary, just create an instance the first time, and then clean it using the reset() method:

ODocument doc = new ODocument();
for(int i=0; i< 9999999; i++){
doc.reset(); //here you will reset the ODocument instance
doc.setClassName("Author");
doc.field("id", i);
doc.field("name", "John");
doc.save();
}

This trick works only in a non-transactional context.

Finally, avoid transactions if you can. If you are using a graph database and you have to perform a massive insertion of vertices, you can still reset just one vertex:

ODocument doc = db.createVertex();
...
doc.reset();
...

Moreover, since a graph database caches the most used elements, you may disable this:

db.setRetainObjects(false);

Datafile fragmentation

Each time a record is updated or deleted, a hole is created in the datafiles structure. OrientDB tracks these holes and tries to reuse them. However, many updates and deletes can cause a fragmentation of datafiles, just like in a filesystem. To limit this problem, it is suggested to set the oversize attribute of the classes you create. The oversize attribute is used to allocate more space for records once they are created, so as to avoid defragmentation upon updates. The oversize attribute is a multiplying factor where 1.0 or 0 means no oversize. The default values are 0 for document, and 2 for vertices. OrientDB has a defrag algorithm that starts automatically when certain conditions are verified. You can set some of these conditions by using the following configuration parameter:

  • file.defrag.holeMaxDistance: It defines the maximum distance in bytes between two holes that triggers the defrag procedure. The default is 32 KB, -1 means dynamic size. The dynamic size is computed in the ODataLocal class in the getCloserHole() method, as Math.max(32768 * (int) (size / 10000000), 32768), where size is the current size of the file.

The profiler

OrientDB has an embedded profiler that you can use to analyze the behavior of the server. The configuration parameters that act on the profiler are as follows:

  • profiler.enabled: This is a boolean value (enable/disable the profiler), the default value is false.
  • profiler.autoDump.interval: It is the number of seconds between profiler dump. The default value is 0, which means no dump.
  • profiler.autoDump.reset: This is a boolean value, reset the profile at every dump. The default is true.

The dump is a JSON string structured in sections. The first one is a huge collection of information gathered at runtime related to the configuration and resources used by each object in the database. The keys are structured as follows:

  • db.<db-name>: They are database-related metrics
  • db.<db-name>.cache: They are metrics about databases' caching
  • db.<db-name>.data: They are metrics about databases' datafiles, mainly data holes
  • db.<db-name>.index: They are metrics about databases' indexes
  • system.disk: They are filesystem-related metrics
  • system.memory: They are RAM-related metrics
  • system.config.cpus: They are the number of the cores
  • process.network: They are network metrics
  • process.runtime: They provide process runtime information and metrics
  • server.connections.actives: They are number of active connections

The second part of the dump is a collection of chronos. A chrono is a log of an operation, for example, a create operation, an update operation, and so on. Each chrono has the following attributes:

  • last: It is the last time recorded
  • min: It is the minimum time recorded
  • max: It is the maximum time recorded
  • average: It is the average time recorded
  • total: It is the total time recorded
  • entries: It is the number of times the specific metric has been recorded

Finally, there are sections about many counters.

Query tips

In the following paragraphs some useful information on how to optimize the queries execution is given.

The explain command

You can see how OrientDB accesses the data by using the explain command in the console. To use this command simply write explain followed by the select statement:

orientdb> explain select from Posts

A set of key-value pairs are returned. Keys mean the following:

  • resultType: It is the type of the returned resultset. It can be collection, document, or number.
  • resultSize: It is the number of records retrieved if the resultType is collection.
  • recordReads: It is the number of records read from datafiles.
  • involvedIndexes: They are the indices involved in the query.
  • indexReads: It is the number of records read from the indices.
  • documentReads: They are the documents read from the datafiles. This number could be different from recordReads, because in a scanned cluster there can be different kinds of records.
  • documentAnalyzedCompatibleClass: They are the documents analyzed belonging to the class requested by the query. This number could be different from documentReads, because a cluster may contain several different classes.
  • elapsed: This time is measured in nanoseconds, it is the time elapsed to execute the statement.

As you can see, OrientDB can use indices to speed up the reads.

Indexes

You can define indexes as we do in a relational database using the create index statement or via Java API using the createIndex() method of the OClass class:

create index <class>.<property> [unique|notunique|fulltext] [field type]

Or for composite index (an index on more than one property):

create index <index_name> on <class> (<field1>,<field2>)
[unique|notunique|fulltext]

If you create a composite index, OrientDB will use it also when in a where clause you don't specify a criteria against all the indexed fields. So you can avoid this to build an index for each field you use in the queries if you have already built a composite one. This is the case of a partial match search and further information about it can be found in the OrientDB wiki at https://github.com/nuvolabase/orientdb/wiki/Indexes#partial-match-search.

Generally, the indexes don't work with the like operator. If you want to perform the following query:

select from Authors where name like 'j%'

And you want use an index, you must define on the field name a FULLTEXT index.

FULLTEXT indices permit to index string fields. However keep in mind that indices slow down the insert, update, and delete operations.

Summary

In this article we have seen some strategies that try to optimize both the OrientDB server installation and queries.

Resources for Article:


Further resources on this subject:


Getting Started with OrientDB A practical guide to learn, deploy, and customize OrientDB with this book and ebook
Published: August 2013
eBook Price: $20.99
Book Price: $34.99
See more
Select your format and quantity:

About the Author :


Claudio Tesoriero

Claudio Tesoriero is an OrientDB Certified Developer and a senior software engineer with twenty years' experience in Information Technology. His first experience was with the Italian Ministry of the Treasury, then he worked for the Bull Group (www.bull.com) and got involved in projects developed for Telecom Italia (www.telecomitalia.it) and in R&D projects developed in collaboration with the Rome Tor Vergata University. He then worked for FutureSpace Spa (www.futurespace.it) and he participated in the implementation of various projects for the government administration. Currently, he is the cofounder of BaasBox, a solution of Backend as a Service based on the Play! Framework and OrientDB.

Books From Packt


Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Java Persistence with MyBatis 3
Java Persistence with MyBatis 3

Microsoft SQL Server 2012 Security Cookbook
Microsoft SQL Server 2012 Security Cookbook

MySQL Admin Cookbook
MySQL Admin Cookbook

PostgreSQL Server Programming
PostgreSQL Server Programming

PostgreSQL 9.0 High Performance
PostgreSQL 9.0 High Performance

PostgreSQL 9 Admin Cookbook
PostgreSQL 9 Admin Cookbook

Instant PostgreSQL Starter [Instant]
Instant PostgreSQL Starter [Instant]


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software