Ehcache has a bulk loading mode that dramatically speeds up bulk loading into caches using the Terracotta Server Array.
Bulk loading is designed to be used for:
The characteristics of bulk loading are that
With bulk loading, the API for putting data into the cache stays the same. Just use cache.put(...) cache.load(...) or cache.loadAll(...)>>>.
What changes is that there is a special mode that suspends Terracotta's normal coherence guarantees and provides optimised flushing to the Terracotta Server Array (the L2 cache).
This mode can be enabled programmatically or statically in ehcache.xml. Programmatically, four methods control coherent behaviour: setNodeCoherence(boolean mode), isNodeCoherent(), isClusterCoherent() and waitUntilClusterCoherent().
setNodeCoherence(false) sets coherence to false for the Ehcache node. The setting for the rest of the cluster stays the same. The effect is that normal read and write locks are not obtained. setNodeCoherence(true) brings back the cache to coherent mode for the node.
Use this to find out if the node is in coherent mode locally. This does not account for other nodes in the cluster (if any). The node may be coherent while its incoherent cluster-wide (like some other node is incoherent)
Reflects whether the cache is in coherent or incoherent mode cluster-wide. Coherent cluster-wide means that all nodes in the cluster is using the cache in coherent mode. If even one of the nodes is using the cache in incoherent mode, the cache is incoherent cluster-wide
Calling this method will block the calling thread until the cache becomes coherent cluster-wide.
waitUntilClusterCoherent
waits until everyone is coherent. Will not return until the entire cluster is coherent.
setNodeCoherence(true | false)
This affects the local node only. The settings in the rest of the cluster are not affected.
Then to put it back call with true parameter.
This method does not return until all the transactions are flushed to the cluster. Only the calling thread is blocked. This way you know when coherence is restored. This returns as soon DONT SAY. Will make async later.
Everyone block
The initial state is from the config.
In a local standalone cache, setNodeCoherence should throw an UnsupportedOperationException. waitUntilClusterCoherent will also throw an UnsupportedOperationException.
Coherent mode may be set by default in the configuration.
The terracotta element has an attribute coherent which can be true or false. By default it is true.
Ehcache 1.7 introduced a partial implementation of this feature for reads only. That is the coherentRead. It is still honoured but deprecated.
Writes can also be synchronous or asynchronous. This is controlled by the synchronousWrites. When you are running in incoherent mode synchronousWrites are ignored - it is always asynchronous.
The speed performance improvement is an order of magnitude faster.
ehcacheperf (Spring Pet Clinic) now has a bulk load test which shows the performance improvement for using a Terracotta cluster.
It is not necessary to create multiple threads when calling cache.put. Only a marginal performance improvement will result, because the call is already so fast.
It is only necessary if the source is slow. By reading from the source in multiple threads a speed up could result. An example is a database, where multiple reading threads will often be better.
The implementation scales very well when the load is split up against multiple Ehcache CacheManagers on multiple machines.
You add extra nodes for bulk loading to get up to 93 times performance.
Terracotta clustering provides coherence, scaling and durability. Some applications will require coherence, or not for some caches, such as reference data. It is possible to run a cache permanently in incoherent mode.
In ehcache.xml, set the coherent attribute to false in the terracotta element. The terracotta element is a sub-element of cache, so this can be configured per cache.
The bulk loading feature is in the ehcache-core module but only provides a performance improvement to Terracotta clusters (as bulk loading to Ehcache standalone is very fast already)
Download here.
For a full distribution enabling connection to the Terracotta Server array download here.
Saravanan who was the lead on this feature has blogged about it here.