Call us: +1-415-738-4000
The Ehcache Search API allows you to execute arbitrarily complex queries against caches with pre-built indexes. The development of alternative indexes on values provides the ability for data to be looked up based on multiple criteria instead of just keys.
As of version 2.6, standalone Ehcache with BigMemory, as well as Terracotta clustered caches, use indexing. The Ehcache Search API also queries standalone Ehcache without BigMemory using a direct search method. For more information, refer to the Implementation and Performance section below.
Searchable attributes may be extracted from both keys and values. Keys, values, or summary values (Aggregators) can all be returned. Here is a simple example: Search for 32-year-old males and return the cache values.
Results results = cache.createQuery().includeValues()
.addCriteria(age.eq(32).and(gender.eq("male"))).execute();
Searches can be performed against Element keys and values, but they must be treated as attributes. Some Element keys and values are directly searchable and can simply be added to the search index as attributes. Some Element keys and values must be made searchable by extracting attributes with supported search types out of the keys and values. It is the attributes themselves which are searchable.
Caches can be made searchable, on a per cache basis, either by configuration or programmatically.
Caches are made searchable by adding a <searchable/> tag to the ehcache.xml.
<cache name="cache2" maxBytesLocalHeap="16M" eternal="true" maxBytesLocalOffHeap="256M">
<persistence strategy="localRestartable"/>
<searchable/>
</cache>
This configuration will scan keys and vales and, if they are of supported search types, add them as attributes called "key" and "value" respectively. If you do not want automatic indexing of keys and values, you can disable it with:
<cache name="cacheName" ...>
<searchable keys="false" values="false"/>
...
</searchable>
</cache>
You might want to do this if you have a mix of types for your keys or values. The automatic indexing will throw an exception if types are mixed.
Often keys or values will not be directly searchable and instead you will need to extract searchable attributes out of them. The following example shows this more typical case. Attribute Extractors are explained in more detail in the following section.
<cache name="cache3" maxEntriesLocalHeap="10000" eternal="true" maxBytesLocalOffHeap="10G">
<persistence strategy="localRestartable"/>
<searchable>
<searchAttribute name="age" class="net.sf.ehcache.search.TestAttributeExtractor"/>
<searchAttribute name="gender" expression="value.getGender()"/>
</searchable>
</cache>
The following example shows how to programmatically create the cache configuration, with search attributes.
Configuration cacheManagerConfig = new Configuration();
CacheConfiguration cacheConfig = new CacheConfiguration("myCache", 0).eternal(true);
Searchable searchable = new Searchable();
cacheConfig.addSearchable(searchable);
// Create attributes to use in queries.
searchable.addSearchAttribute(new SearchAttribute().name("age"));
// Use an expression for accessing values.
searchable.addSearchAttribute(new SearchAttribute()
.name("first_name")
.expression("value.getFirstName()"));
searchable.addSearchAttribute(new SearchAttribute().name("last_name").expression("value.getLastName()"));
searchable.addSearchAttribute(new SearchAttribute().name("zip_code").expression("value.getZipCode()"));
cacheManager = new CacheManager(cacheManagerConfig);
cacheManager.addCache(new Cache(cacheConfig));
Ehcache myCache = cacheManager.getEhcache("myCache");
// Now create the attributes and queries, then execute.
...
To learn more about the Ehcache Search API, see the net.sf.ehcache.search* packages in this Javadoc.
In addition to configuring a cache to be searchable, you must define the attributes that will be used in searches.
Attributes are extracted from keys or values, either during search or, if using Distributed Ehcache, on put() into the cache. This is done using AttributeExtractors. Extracted attributes must be one of the following supported types:
If an attribute cannot be extracted, due to not being found or being the wrong type, an AttributeExtractorException is thrown on search execution or, if using Distributed Ehcache, on put().
The parts of an Element that are well-known attributes can be referenced by some predefined, well-known names.
If a key and/or value is of a supported search type, it is added automatically as an attribute with the name
"key" or "value".
These well-known attributes have the convenience of being constant attributes made available on the Query class.
So, for example, the attribute for "key" may be referenced in a query by Query.KEY. For even greater readability, it is
recommended to statically import so that, in this example, you would just use KEY.
| Well-known Attribute Name | Attribute Constant |
|---|---|
| key | Query.KEY |
| value | Query.VALUE |
The ReflectionAttributeExtractor is a built-in search attribute extractor which uses JavaBean conventions and also understands a simple form of expression. Where a JavaBean property is available and it is of a searchable type, it can be simply declared:
<cache>
<searchable>
<searchAttribute name="age"/>
</searchable>
</cache>
The expression language of the ReflectionAttributeExtractor also uses method/value dotted expression chains. The expression chain must start with one of either "key", "value", or "element". From the starting object a chain of either method calls or field names follows. Method calls and field names can be freely mixed in the chain. Some more examples:
<cache>
<searchable>
<searchAttribute name="age" expression="value.person.getAge()"/>
</searchable>
</cache>
<cache>
<searchable>
<searchAttribute name="name" expression="element.toString()"/>
</searchable>
</cache>
Note: The method and field name portions of the expression are case sensitive.
In more complex situations, you can create your own attribute extractor by implementing the AttributeExtractor interface. Provide your extractor class, as shown in the following example:
<cache name="cache2" maxEntriesLocalHeap="0" eternal="true">
<persistence strategy="none"/>
<searchable>
<searchAttribute name="age" class="net.sf.ehcache.search.TestAttributeExtractor"/>
</searchable>
</cache>
If you need to pass state to your custom extractor, you may do so with properties, as shown in the following example:
<cache>
<searchable>
<searchAttribute name="age"
class="net.sf.ehcache.search.TestAttributeExtractor"
properties="foo=this,bar=that,etc=12" />
</searchable>
</cache>
If properties are provided, then the attribute extractor implementation must have a public constructor that accepts a single java.util.Properties instance.
Ehcache Search uses a fluent, object-oriented Query API, following DSL principles, which should feel familiar and natural to Java programmers. Here is a simple example:
Query query = cache.createQuery().addCriteria(age.eq(35)).includeKeys().end();
Results results = query.execute();
If declared and available, the well-known attributes are referenced by their names or the convenience attributes are used directly, as shown in this example:
Results results = cache.createQuery().addCriteria(Query.KEY.eq(35)).execute();
Results results = cache.createQuery().addCriteria(Query.VALUE.lt(10)).execute();
Other attributes are referenced by the names given them in the configuration. For example:
Attribute<Integer> age = cache.getSearchAttribute("age");
Attribute<String> gender = cache.getSearchAttribute("gender");
Attribute<String> name = cache.getSearchAttribute("name");
A Query is built up using Expressions. Expressions may include logical operators such as <and> and <or>, and comparison operators such as <ge> (>=), <between>, and <like>.
The configuration addCriteria(...) is used to add a clause to a query. Adding a further clause automatically "<and>s" the clauses.
query = cache.createQuery().includeKeys().addCriteria(age.le(65)).add(gender.eq("male")).end();
Both logical and comparison operators implement the Criteria interface.
To add a criteria with a different logical operator, explicitly nest it within a new logical operator Criteria Object. For example, to check for age = 35 or gender = female, do the following:
query.addCriteria(new Or(age.eq(35),
gender.eq(Gender.FEMALE))
);
More complex compound expressions can be further created with extra nesting. See the Expression JavaDoc for a complete list of expressions.
Operators are available as methods on attributes, so they are used by adding a ".". For example, "lt" means "less than" and is used as age.lt(10), which is a shorthand way of saying age LessThan(10).
The full listing of operator shorthand is shown below.
| Shorthand | Criteria Class | Description |
|---|---|---|
| and | And | The Boolean AND logical operator |
| between | Between | A comparison operator meaning between two values |
| eq | EqualTo | A comparison operator meaning Java "equals to" condition |
| gt | GreaterThan | A comparison operator meaning greater than. |
| ge | GreaterThanOrEqual | A comparison operator meaning greater than or equal to. |
| in | InCollection | A comparison operator meaning in the collection given as an argument |
| lt | LessThan | A comparison operator meaning less than. |
| le | LessThanOrEqual | A comparison operator meaning less than or equal to |
| ilike | ILike | A regular expression matcher. '?' and "*" may be used. Note that placing a wildcard in front of the expression will cause a table scan. ILike is always case insensitive. |
| not | Not | The Boolean NOT logical operator |
| ne | NotEqualTo | A comparison operator meaning not the Java "equals to" condition |
| or | Or | The Boolean OR logical operator |
By default, a query can be executed and then modified and re-executed. If end is called,
the query is made immutable.
Queries return a Results object which contains a list of objects of class Result. Each Element in the cache found with a query will be represented as a Result object. So if a query finds
350 elements, there will be 350 Result objects. An exception to this would be if no keys or attributes are included but
aggregators are -- in this case, there will be exactly one Result present.
A Result object can contain:
includeKeys() is added to the query,includeValues() is added to the query,includeAttribute(...) is added to the query. To access an attribute from a Result, use getAttribute(Attribute<T> attribute).Result.getAggregatorResults which returns a list of Aggregators in the same order in which they were used in the Query.Aggregators are added with query.includeAggregator(\<attribute\>.\<aggregator\>).
For example, to find the sum of the age attribute:
query.includeAggregator(age.sum());
For a complete list of aggregators, refer to the Aggregators JavaDoc.
Query results may be ordered in ascending or descending order by adding an addOrderBy clause to the query, which takes
as parameters the attribute to order by and the ordering direction. For example, to order the results by ages in ascending order:
query.addOrderBy(age, Direction.ASCENDING);
With Ehcache 2.6 and higher, query results may be grouped similarly to using an SQL GROUP BY statement. The Ehcache GroupBy feature provides the option to group results according to specified attributes by adding an addGroupBy clause to the query, which takes as parameters the attributes to group by. For example, you can group results by department and location like this:
Query q = cache.createQuery();
Attribute<String> dept = cache.getSearchAttribute(“dept”);
Attribute<String> loc = cache.getSearchAttribute(“location”);
q.includeAttribute(dept);
q.includeAttribute(loc);
q.addCriteria(cache.getSearchAttribute(“salary”).gt(100000));
q.includeAggregator(Aggregators.count());
q.addGroupBy(dept, loc);
The GroupBy clause groups the results from includeAttribute() and allows aggregate functions to be performed on the grouped attributes. To retrieve the attributes that are associated with the aggregator results, you can use:
String dept = singleResult.getAttribute(dept);
String loc = singleResult.getAttribute(loc);
Grouping query results adds another step to the query--first results are returned, and second the results are grouped. This necessitates the following rules and considerations when using GroupBy:
includeAttribute() should also be included in the GroupBy clause. includeKeys() and includeValues() may not be used in a query that has a GroupBy clause.addCriteria() clause applies to all results prior to grouping. By default a query will return an unlimited number of results. For example the following query will return all keys in the cache.
Query query = cache.createQuery();
query.includeKeys();
query.execute();
If too many results are returned, it could cause an OutOfMemoryError
The maxResults clause is used to limit the size of the results.
For example, to limit the above query to the first 100 elements found:
Query query = cache.createQuery();
query.includeKeys();
query.maxResults(100);
query.execute();
Note: When maxResults is used with GroupBy, it limits the number of groups.
When you are done with the results, call discard() to free up resources.
In the distributed implementation with Terracotta, resources may be used to hold results for paging or return.
To determine what was returned by a query, use one of the interrogation methods on Results:
hasKeys()hasValues()hasAttributes()hasAggregators()We have created a simple standalone sample application with few dependencies for you to easily get started with Ehcache Search. You can also check out the source:
git clone git://github.com/sharrissf/Ehcache-Search-Sample.git
The Ehcache Test Sources page has further examples on how to use each Ehcache Search feature.
Ehcache Search is readily amenable to scripting. The following example shows how to use it with BeanShell:
Interpreter i = new Interpreter();
//Auto discover the search attributes and add them to the interpreter's context
Map<String, SearchAttribute> attributes = cache.getCacheConfiguration().getSearchAttributes();
for (Map.Entry<String, SearchAttribute> entry : attributes.entrySet()) {
i.set(entry.getKey(), cache.getSearchAttribute(entry.getKey()));
LOG.info("Setting attribute " + entry.getKey());
}
//Define the query and results. Add things which would be set in the GUI i.e.
//includeKeys and add to context
Query query = cache.createQuery().includeKeys();
Results results = null;
i.set("query", query);
i.set("results", results);
//This comes from the freeform text field
String userDefinedQuery = "age.eq(35)";
//Add on the things that we need
String fullQueryString = "results = query.addCriteria(" + userDefinedQuery + ").execute()";
i.eval(fullQueryString);
results = (Results) i.get("results");
assertTrue(2 == results.size());
for (Result result : results.all()) {
LOG.info("" + result.getKey());
}
This implementation uses indexes which are maintained on each Terracotta server. In Ehcache EX the index is on a single active server. In Ehcache FX the cache is sharded across the number of active nodes in the cluster. The index for each shard is maintained on that shard's server. Searches are performed using the Scatter-Gather pattern. The query executes on each node and the results are then aggregated back in the Ehcache that initiated the search.
Search operations perform in O(log n / number of shards) time. Performance is excellent but can be improved simply by adding more servers to the FX array. Also, because Search results are returned over the network, and the data returned could potentially be very large, techniques to limit return size are recommended. For more information, refer to Best Practices.
As of version 2.6, standalone Ehcache with BigMemory uses a Search index that is maintained at the local node. The index is stored under a directory in the DiskStore and is available whether or not persistence is enabled. Any overflow from the on-heap tier of the cache, whether to the off-heap tier or to the disk tier, is searched using indexes.
Search operations perform in O(log(n)) time. For tips that can aid performance, refer to Best Practices.
For caches that are on-heap only, the standalone Ehcache Search implementation does not use indexes. Instead, it performs a fast iteration of the cache, relying on the very fast access to do the equivalent of a table scan for each query. Each element in the cache is only visited once. Attributes are not extracted ahead of time. They are done during query execution.
Search operations perform in O(n) time. Check out this Maven-based performance test showing performance of standalone Ehcache without BigMemory. The test shows search performance of an average of representative queries at 4.6 ms for a 10,000 entry cache, and 427 ms for a 1,000,000 entry cache. Accordingly, standalone implementation is suitable for development and testing.
When using standalone Ehcache without BigMemory for production, it is recommended to search only caches that are less than 1 million elements. Performance of different Criteria vary. For example, here are some queries and their execute times on a 200,000 element cache. (Note that these results are all faster than the times given above because they execute a single Criteria).
final Query intQuery = cache.createQuery();
intQuery.includeKeys();
intQuery.addCriteria(age.eq(35));
intQuery.end();
Execute Time: 62ms
final Query stringQuery = cache.createQuery();
stringQuery.includeKeys();
stringQuery.addCriteria(state.eq("CA"));
stringQuery.end();
Execute Time: 125ms
final Query iLikeQuery = cache.createQuery();
iLikeQuery.includeKeys();
iLikeQuery.addCriteria(name.ilike("H*"));
iLikeQuery.end();
Execute Time: 180ms
Construct searches wisely by including only the data that is actually required.
includeKeys() and/or includeAttribute() if those values are actually required for your application logic.result.getValue() is not called in the search results, then don't use includeValues() in the original query. includeValues() and then result.getValue(), run the query for keys and include cache.get() for each individual key. Note: As of Ehcache 2.6, includeKeys() and includeValues() have lazy deserialization, which means that keys and values are de-serialized only when result.getKey() or result.getValue() is called. This provides better latency overall, with a time cost only when the key is needed. However, there is still some time cost with includeKeys() and includeValues(), so consider carefully when constructing your queries.
Searchable keys and values are automatically indexed by default. If you will not be including them in your query, turn off automatic indexing with the following:
<cache name="cacheName" ...>
<searchable keys="false" values="false"/>
...
</searchable>
</cache>
Limit the size of the results set with query.maxResults(int number_of_results). Another recommendation for managing the size of the result set is to use a built-in Aggregator function to return a summary statistic (see the net.sf.ehcache.search.aggregator package in this Javadoc).
Make your search as specific as possible. Queries with "ILike" criteria and fuzzy (wildcard) searches may take longer than more specific queries. Also, if you are using a wildcard, try making it the trailing part of the string instead of the leading part ("321*" instead of "*123"). If you want leading wildcard searches, then you should create a <searchAttribute> with the string value reversed in it, so that your query can use the trailing wildcard instead.
When possible, use the query criteria "Between" instead of "LessThan" and "GreaterThan", or "LessThanOrEqual" and "GreaterThanOrEqual". For example, instead of using le(startDate) and ge(endDate), try not(between(startDate,endDate)).
Index dates as integers. This can save time and may even be faster if you have to do a conversion later on.
Searches of eventually consistent caches are faster because queries are executed immediately, without waiting for pending transactions at the local node to commit. Note: This means that if a thread adds an element into an eventually consistent cache and immediately runs a query to fetch the element, it will not be visible in the search results until the update is published to the server.
Unlike cache operations which have selectable concurrency control and/or transactions, queries are asynchronous and Search results are eventually consistent with the caches.
Although indexes are updated synchronously, their state will lag slightly behind the state of the cache. The only exception is when the updating thread then performs a search.
For caches with concurrency control, an index will not reflect the new state of the cache until:
commit has been called.There are several ways unexpected results could present:
sum(), might disagree with the same calculation done by redoing the calculation yourself by re-accessing the cache for each key and repeating the calculation.Because the state of the cache can change between search executions, the following is recommended:
