Understand and leverage the serialization subsystem in Ehcache 3

The Ehcache 3 documentation at Serializers gives you an overview of how to use custom serializers with a cache. The section on Persistent and Transient Serializers briefly covers the serializer contracts that must be honored while writing custom serializers to be used with persistent/transient caches.

This article explains how you can write a transient/persistent custom serializer that works with Ehcache. Here we discuss the significance of transient serializers and persistent serializers in detail through some practical examples.

Serializer types

As indicated in the Ehcache documentation, serializers require a single argument constructor or a double-argument constructor or both based on the type of cache they are used in. The single-argument constructor is fit to be used with transient caches and the ones with the double-argument constructor can be used with persistent caches. An implementation having both the constructors can be used with both persistent and transient caches.

Hmm…​ So what does that really mean?

If you look at the custom serializer implementations in the GettingStarted samples they are all have both the constructors and if you look at the code they don’t do anything different. It’s all standard java serialization.

  • So what difference do the constructors make?

  • What is a transient serializer with single-argument constructor?

  • How do I implement a persistent serializer with the double-argument constructor?

  • When would I use both?

Read along for the answers to these questions.

These constructors are associated with the state of the serializer implementations. So if your custom serializer doesn’t have any state associated with it; that affects the serialization and deserialization logic; then that is a serializer implementation that can safely be used with transient and persistent caches. Such serializers would have both the constructors. If you look at the LongSerializer or StringSerializer implementations in the GettingStarted samples, they don’t have any state that the serialization and deserialization depend on.

So what are these serializers with state? I’ll try to explain that with some examples in the subsequent sections.

The code samples in this article were compiled and tested with Ehcache v3.0.0. Complete samples can be found at https://github.com/albinsuresh/ehcache-demo

Stateful serializers

I have an application that deals with fruits. So I have a fruits cache Cache<Long, String> that holds the mappings from fruit ids to fruit names. If this cache is a multi-tiered one then the keys and values will be stored in their serialized form in the non-heap tiers. For simplicity I’ll restrict the scope of our discussion only to the values that are fruit names of type String. I can use standard Java serialization to serialize these values. But for some reason I wanted to reduce the amount of serialized data. So instead of serializing the strings directly I decided to map all the fruit names to some integer and store those serialized integers instead of strings thinking that it’d save some space(dumb, huh?). Since this serializer is designed specifically for the fruits cache, I was fairly confident that the integer range would be more than enough to handle all possible fruit names on this planet. And here is the serializer implementation that I came up with:

public class SimpleTransientStringSerializer implements Serializer<String> {

  protected Map<Integer, String> idStringMap = new HashMap<Integer, String>();
  protected Map<String, Integer> stringIdMap = new HashMap<String, Integer>();
  protected int id = 0;

  public SimpleTransientStringSerializer(ClassLoader loader) {
    //no-op
  }

  @Override
  public ByteBuffer serialize(final String object) throws SerializerException {
    Integer currentId = stringIdMap.get(object);
    if(currentId == null) {
      stringIdMap.put(object, id);
      idStringMap.put(id, object);
      currentId = id++;
    }

    ByteBuffer buff = ByteBuffer.allocate(4);
    buff.putInt(currentId).flip();
    return buff;
  }

  @Override
  public String read(final ByteBuffer binary) throws ClassNotFoundException, SerializerException {
    Integer mapping = binary.getInt();
    String obj = idStringMap.get(mapping);
    if(obj == null) {
      throw new SerializerException("Unable to serialize: " + binary.array() + ". No value mapping found for " + mapping);
    }
    return obj;
  }

  @Override
  public boolean equals(final String object, final ByteBuffer binary) throws ClassNotFoundException, SerializerException {
    return object.equals(read(binary));
  }
}

In short this is what the above serializer does: Whenever it gets a string(the fruit name, in our application) to be serialized it checks if there is a mapping that exists already for that name in stringIdMap. If yes, the mapped integer is retrieved from the map and that integer value is serialized. If a mapping is not found, we generate a new id for the new fruit name add it to both the maps that we preserve (stringIdMap and idStringMap) and then serialize this newly generated id. Now on deserialization, the same idStringMap map is used to retrieve the fruit names from the deserialised integer values.

So in the above serializer, the idStringMap, stringIdMap and the id constitutes the state of the serializer. The serialization and deserialization depends on this state and would not work properly without that state. This serializer has the single-argument constructor making it fit to be used with transient caches. So now that we have a state-full serializer understanding the idea of transient and persistent serializers would be simpler.

Here is a sample code that uses the SimpleTransientStringSerializer with a cache:

CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder().build(true);
CacheConfiguration<Long, String> cacheConfig =
    CacheConfigurationBuilder.newCacheConfigurationBuilder(
        Long.class, String.class, ResourcePoolsBuilder.heap(10).offheap(5, MemoryUnit.MB))  (1)
    .withValueSerializer(SimpleTransientStringSerializer.class)   (2)
    .build();

Cache<Long, String> fruitsCache = cacheManager.createCache("fruitsCache", cacheConfig);
fruitsCache.put(1L, "apple");
fruitsCache.put(2L, "orange");
fruitsCache.put(3L, "mango");
assertThat(fruitsCache.get(1L), is("apple"));   (3)
assertThat(fruitsCache.get(3L), is("mango"));
assertThat(fruitsCache.get(2L), is("orange"));
assertThat(fruitsCache.get(1L), is("apple"));
1 Create a multi-tiered cache that requires key and value serialization.
2 Configure a serializer for the values. The SimpleTransientStringSerializer in this case. For the sake of simplicity we have omitted key serializer. Since one is not provided explicitly, ehcache would provide default serializers to perform the key serialization.
3 Verify that the cache/serializer works.

In the previous section we demonstrated the use of a transient serializer. We used that serializer with a transient cache and everything works just fine. Now imagine what would happen if we use the same serializer with a persistent cache. Everything would work as long as your application is running. Once you close the cache manager or end the application the data associated with the cache will be persisted so that the same data will be available on a restart. But there is a serious problem. The following piece of code would demonstrate that:

CacheConfiguration<Long, String> cacheConfig =
    CacheConfigurationBuilder.newCacheConfigurationBuilder(
        Long.class, String.class,
        ResourcePoolsBuilder.heap(10).disk(10, MemoryUnit.MB, true))  (1)
        .withValueSerializer(SimpleTransientStringSerializer.class)
        .build();

CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder()
    .with(new CacheManagerPersistenceConfiguration(new File(PERSISTENCE_PATH)))   (2)
    .withCache("fruitsCache", cacheConfig)
    .build(true);

Cache<Long, String> fruitsCache = cacheManager.getCache("fruitsCache", Long.class, String.class);   (3)
fruitsCache.put(1L, "apple");
fruitsCache.put(2L, "mango");
fruitsCache.put(3L, "orange");   (4)
assertThat(fruitsCache.get(1L), is("apple"));   (5)

cacheManager.close();   (6)
cacheManager.init();    (7)
fruitsCache = cacheManager.getCache("fruitsCache", Long.class, String.class);   (8)
assertThat(fruitsCache.get(1L), is("apple"));   (9)
1 Create a cache configuration with persistent disk tier.
2 Configure the LocalPersistenceService for the cache manager.
3 Retrieve the cache.
4 Populate data.
5 Verify that everything works.
6 Close the cache manager.
7 Reinitialize the cache manager.
8 Retrieve the cache.
9 Retrieve a cached/persisted value.

The above piece of code would fail in the cache creation step since the serializer provided does not meet the 2-arg constructor requirement for persistent caches. But why does Ehcache enforce this requirement and fail-fast if the requirement is violated? What would have happened if we had proceeded with the sample code? Would it have failed? If yes, then where?

The above piece of code would have failed in step 9 because the cache would not be able to retrieve the persisted data. Because the serializer that you provided would fail in retrieving that data. When the cache is reinitialized, the associated serializer instance is also initialized for the cache to work. But the newly initialized serializer would have an empty state(empty stringIdMap and idStringMap maps and the id initialized to 0). So when the cache tries to read a value it gets an integer value from the persistent tier as that is what got persisted. But using the empty state the serializer will not able to map that value to a fruit name, and so it would throw. That leaves the persisted data unusable. So what could you have done differently to make it work?

The answer is simple. Persist the serializer’s state as well and restore it when the cache is re-initialized. And that is exactly what persistent serializers would do.

Persistent serializers

  • A persistent serializer persists its state and retrieves it when reinitialized.

  • A persistent serializer implementation can choose to persist the data wherever it wants.

But a recommended way is to use the cache manager’s LocalPersistenceService so that the cache manager would take care of the persistence. Inorder to do that, the serializer implementation needs to have a constructor that takes in a FileBasedPersistenceContext as an argument, in addition to the class loader argument. The use of the FileBasedPersistenceContext argument is optional. But the presence of this double-argument constructor is a strict requirement for persistent caches. When the cache using this serializer is initialized, this 2-argument constructor is used to instantiate the serializer.

Have a look at this implementation of a persistent serializer. It is just an extension of the same old transient serializer with the persistent stuff wired in.

public class SimplePersistentStringSerializer extends SimpleTransientStringSerializer implements Closeable {

  private final File stateFile;

  public SimplePersistentStringSerializer(final ClassLoader loader, FileBasedPersistenceContext persistence) throws IOException, ClassNotFoundException {
    super(loader);
    stateFile = new File(persistence.getDirectory(), "serializer.data");
    if(stateFile.exists()) {
      restoreState();
    }
  }

  @Override
  public void close() throws IOException {
    persistState();
  }

  private void restoreState() throws IOException, ClassNotFoundException {
    FileInputStream fin = new FileInputStream(stateFile);
    try {
      ObjectInputStream oin = new ObjectInputStream(fin);
      try {
        idStringMap = (Map<Integer, String>) oin.readObject();
        stringIdMap = (Map<String, Integer>) oin.readObject();
        id = oin.readInt();
      } finally {
        oin.close();
      }
    } finally {
      fin.close();
    }
  }

  private void persistState() throws IOException {
    OutputStream fout = new FileOutputStream(stateFile);
    try {
      ObjectOutputStream oout = new ObjectOutputStream(fout);
      try {
        oout.writeObject(idStringMap);
        oout.writeObject(stringIdMap);
        oout.writeInt(id);
      } finally {
        oout.close();
      }
    } finally {
      fout.close();
    }
  }
}

In the above persistent serializer, the state or the serialization/deserialization has not changed. The only additional thing is the persistence logic. And that is fairly simple too. The state is restored on initialization if one is found, and persisted on close. And have a look at the sample from the previous section modified to use our persistent serializer.

CacheConfiguration<Long, String> cacheConfig =
    CacheConfigurationBuilder.newCacheConfigurationBuilder(
        Long.class, String.class,
        ResourcePoolsBuilder.newResourcePoolsBuilder()
            .heap(10, EntryUnit.ENTRIES).disk(10, MemoryUnit.MB, true))
        .withValueSerializer(SimplePersistentStringSerializer.class)   (1)
        .build();

CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder()
    .with(new CacheManagerPersistenceConfiguration(new File(PERSISTENCE_PATH)))
    .withCache("fruitsCache", cacheConfig)
    .build(true);

Cache<Long, String> fruitsCache = cacheManager.getCache("fruitsCache", Long.class, String.class);
fruitsCache.put(1L, "apple");
fruitsCache.put(2L, "mango");
fruitsCache.put(3L, "orange");
assertThat(fruitsCache.get(1L), is("apple"));

cacheManager.close();
cacheManager.init();
fruitsCache = cacheManager.getCache("fruitsCache", Long.class, String.class);
assertThat(fruitsCache.get(1L), is("apple"));
1 The only change from the previous sample is the usage of SimplePersistentStringSerializer here.

Third-party serializers

Ehcache by-default relies on a tweaked form of java standard serialization to perform serialization and deserialization. But most of you already know that java built-in serialization is not the best performing serialization technique. A lot of alternative serialization techniques are available in the market. With the custom serializers support of ehcache you can take advantage of any one of those third-party serializers out there and use those within ehcache. All you have to do is write a custom serializer using the third-party serializer of your choice.

In-order to demonstrate that, I have written a custom serializer using the popular serialization framework Kryo. Samples used in this section are not the same fruits cache based ones. Here I’m using an employee cache of type Cache<Long, Employee>. I have kept the Employee object as simple as possible and yet represent a real-life object structure. These are the structures that we have used:

public class Description {

  String alias;
  int id;

  public Description() {}

  public Description(final String alias, final int id) {
    this.alias = alias;
    this.id = id;
  }

  @Override
  public boolean equals(final Object obj) {
    if(this == obj) return true;
    if(obj == null || this.getClass() != obj.getClass()) return false;

    Description other = (Description)obj;
    if(id != other.id) return false;
    if ((alias == null) ? (alias != null) : !alias.equals(other.alias)) return false;
    return true;
  }

  @Override
  public int hashCode() {
    int result = 1;
    result = 31 * result + id;
    result = 31 * result + (alias == null ? 0 : alias.hashCode());
    return result;
  }

  @Override
  public String toString() {
    return alias + ";" + id;
  }
}
public class Person {

  String name;
  int age;
  Description desc;

  public Person() {}

  public Person(String name, int age, Description desc) {
    this.name = name;
    this.age = age;
    this.desc = desc;
  }

  @Override
  public boolean equals(final Object other) {
    if(this == other) return true;
    if(other == null) return false;
    if(!(other instanceof Person)) return false;

    Person that = (Person)other;
    if(age != that.age) return false;
    if((name == null) ? (that.name != null) : !name.equals(that.name)) return false;

    return true;
  }

  @Override
  public int hashCode() {
    int result = 1;
    result = 31 * result + age;
    result = 31 * result + (name == null ? 0 : name.hashCode());
    return result;
  }

  @Override
  public String toString() {
    return name + ";" + age + "::" + desc;
  }
}
public class Employee extends Person {

  long employeeId;

  public Employee() {}

  public Employee(long employeeId, String name, int age, Description desc) {
    super(name, age, desc);
    this.employeeId = employeeId;
  }

  @Override
  public boolean equals(final Object obj) {
    if (!super.equals(obj)) return false;
    if(!(obj instanceof Employee)) return false;

    Employee other = (Employee)obj;
    if(employeeId != other.employeeId) return false;

    return true;
  }

  @Override
  public int hashCode() {
    return (31 * (int)employeeId) +  super.hashCode();
  }

  @Override
  public String toString() {
    return employeeId + ";" + super.toString();
  }
}
None of the above classes are Serializable. Yet they can be serialized with Kryo. But for that every class needs a no-arg constructor and these classes meet that requirement.

So here is the kryo based custom serializer:

public class KryoSerializer implements Serializer<Employee> {

  private static final Kryo kryo = new Kryo();

  public KryoSerializer(ClassLoader loader) {
    //no-op
  }

  @Override
  public ByteBuffer serialize(final Employee object) throws SerializerException {
    Output output = new Output(4096);
    kryo.writeObject(output, object);
    return ByteBuffer.wrap(output.getBuffer());
  }

  @Override
  public Employee read(final ByteBuffer binary) throws ClassNotFoundException, SerializerException {
    Input input =  new Input(new ByteBufferInputStream(binary)) ;
    return kryo.readObject(input, Employee.class);
  }

  @Override
  public boolean equals(final Employee object, final ByteBuffer binary) throws ClassNotFoundException, SerializerException {
    return object.equals(read(binary));
  }

}

The above serializer is a state-less one that demonstrates the basic integration with kryo. Here is the sample code that uses the same:

CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder().build(true);
CacheConfiguration<Long, Employee> cacheConfig =
    CacheConfigurationBuilder.newCacheConfigurationBuilder(Long.class, Employee.class, ResourcePoolsBuilder.heap(10))
        .withValueSerializer(KryoSerializer.class)  (1)
        .build();

Cache<Long, Employee> employeeCache = cacheManager.createCache("employeeCache", cacheConfig);
Employee emp =  new Employee(1234, "foo", 23, new Description("bar", 879));
employeeCache.put(1L, emp);
assertThat(employeeCache.get(1L), is(emp));
1 Here we configure the KryoSerializer for the VALUE.

Using some advanced features of kryo I managed to write the transient only and persistent only versions too.

Here is the transient one:

public class TransientKryoSerializer implements Serializer<Employee>, Closeable{

  protected static final Kryo kryo = new Kryo();

  protected Map<Class, Integer> objectHeaderMap = new HashMap<Class, Integer>();  (1)

  public TransientKryoSerializer() {
  }

  public TransientKryoSerializer(ClassLoader loader) {
    populateObjectHeadersMap(kryo.register(Employee.class));  (2)
    populateObjectHeadersMap(kryo.register(Person.class));  (3)
    populateObjectHeadersMap(kryo.register(Description.class)); (4)
  }

  protected void populateObjectHeadersMap(Registration reg) {
    objectHeaderMap.put(reg.getType(), reg.getId());  (5)
  }

  @Override
  public ByteBuffer serialize(Employee object) throws SerializerException {
    Output output = new Output(new ByteArrayOutputStream());
    kryo.writeObject(output, object);
    output.close();

    return ByteBuffer.wrap(output.getBuffer());
  }

  @Override
  public Employee read(final ByteBuffer binary) throws ClassNotFoundException, SerializerException {
    Input input =  new Input(new ByteBufferInputStream(binary)) ;
    return kryo.readObject(input, Employee.class);
  }

  @Override
  public boolean equals(final Employee object, final ByteBuffer binary) throws ClassNotFoundException, SerializerException {
    return object.equals(read(binary));
  }

  @Override
  public void close() throws IOException {
    objectHeaderMap.clear();
  }
}
1 This objectHeaderMap is the state of the serializer. When an object is serialized the fully qualified name of the class is written in the header. Since writing the entire name is costly I decided to map these names to some integer values and then write out that integer instead of the name. So this map would contain the mapping between the classes and the corresponding integer values.
2 Here we register a class with kryo and then kryo will assign an integer value to that class so that all instances of class will be serialized with this assigned integer in-place of the fully-qualified class name. The Employee class in this case. Refer Kryo#Registartion for more information.
3 Since Employee extends Person we register that too.
4 Since the Person class contain a Description instance we register that too. So the idea is to register all known custom class types associated with the object to be serialized(the employee object).
5 This is how we populate the objectHeaderMap every time we register a class.

The following sample is the same as the one in the previous section with just the serializer changed:

CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder().build(true);
CacheConfiguration<Long, Employee> cacheConfig =
    CacheConfigurationBuilder.newCacheConfigurationBuilder(Long.class, Employee.class, ResourcePoolsBuilder.heap(10))
        .withValueSerializer(TransientKryoSerializer.class)
        .build();

Cache<Long, Employee> employeeCache = cacheManager.createCache("employeeCache", cacheConfig);
Employee emp =  new Employee(1234, "foo", 23, new Description("bar", 879));
employeeCache.put(1L, emp);
assertThat(employeeCache.get(1L), is(emp));

The above sample must be self explanatory as we have already seen this sample so many times.

And now the persistent adaptation of the transient serializer is here:

public class PersistentKryoSerializer extends TransientKryoSerializer {

  private final File stateFile;

  public PersistentKryoSerializer(ClassLoader loader, FileBasedPersistenceContext persistence) throws IOException, ClassNotFoundException {
    stateFile = new File(persistence.getDirectory(), "PersistentKryoSerializerState.ser");
    if(stateFile.exists()) {  (1)
      restoreState();   (2)
      for(Map.Entry<Class, Integer> entry: objectHeaderMap.entrySet()) {  (3)
        kryo.register(entry.getKey(), entry.getValue());  (4)
      }
    }
  }

  @Override
  public void close() throws IOException {
    persistState(); (5)
  }

  private void persistState() throws FileNotFoundException {
    Output output = new Output(new FileOutputStream(stateFile));
    try {
      kryo.writeObject(output, objectHeaderMap);
    } finally {
      output.close();
    }
  }

  private void restoreState() throws FileNotFoundException {
    Input input = new Input(new FileInputStream(stateFile));
    try {
      objectHeaderMap = kryo.readObject(input, HashMap.class);
    } finally {
      input.close();
    }
  }
}

You must be familiar with this routine already:

1 On initialization, if a persistent file is found…​
2 Restore the contents of the file which essentially restores the objectHeaderMap
3 Then iterate through the contents of the map and…​
4 Register the types again with kryo using the same integer mapped values. Then only the persisted data can be deserialized as they are persisted with these integer values in their headers.
5 On close, the map is serialized and persisted to a file.

And the familiar test sample again testing this persistent serializer implementation:

CacheConfiguration<Long, Employee> cacheConfig =
    CacheConfigurationBuilder.newCacheConfigurationBuilder(
        Long.class, Employee.class,
        ResourcePoolsBuilder.newResourcePoolsBuilder()
            .heap(10, EntryUnit.ENTRIES).offheap(5, MemoryUnit.MB).disk(10, MemoryUnit.MB, true))
        .withValueSerializer(PersistentKryoSerializer.class)
        .build();

CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder()
    .with(new CacheManagerPersistenceConfiguration(new File(PERSISTENCE_PATH)))
    .withCache("employeeCache", cacheConfig)
    .build(true);

Cache<Long, Employee> employeeCache = cacheManager.getCache("employeeCache", Long.class, Employee.class);
Employee emp =  new Employee(1234, "foo", 23, new Description("bar", 879));
employeeCache.put(1L, emp);
assertThat(employeeCache.get(1L), is(emp));

cacheManager.close();
cacheManager.init();
employeeCache = cacheManager.getCache("employeeCache", Long.class, Employee.class);
assertThat(employeeCache.get(1L), is(emp));