Monday, May 4, 2009

Hibernate Wars: The Query Cache Strikes Back

Not so long ago, in a galaxy not very far away... a middle-aged programmer battled to free memory from the dark side of the hibernate query cache.  He was successful, or so it seemed. For the query cache memory problems had risen from the ashes -- stronger and more bloated than ever...


What's With All This Garbage?


We hit a case at work (again), where the java server process came to a grinding halt.  It wasn't dead, it just wasn't making much forward progress either.  A quick look at the heap stats showed we were nearly at capacity and that garbage collection was eating all CPU time, thus not allowing any real work to happen.  Looks like it is time to whip out the ol' memory profiler.  This time I went with Eclipse's Memory Analysis Tool .  It's pretty slick.  I suggest you try it.

Query Cache Waste Redux


Not terribly unexpected, the hibernate caches were the top consumer of heap space.  I have previously explored ways to trim down the memory used by the query cache.  But, we depend heavily on it and size it pretty big, so that's why I wasn't too surprised to see them at the top of the heap report.  Drilling down a little further showed that it was not the contents of the query cache results that were causing the problem.  That was the unexpected part.  Rats, our friend QueryKey is once again a giant waster of memory.



What is it this time?  In A Dirty Little Secret, I showed that you should use object identifiers in your HQL and as parameters so that full objects are not stored as part of the QueryKey.  This was a major win.  I also showed that by providing a decorator query cache implementation, you can reduce all duplicate stored parameters by replacing them with canonical representations.  Using MAT (Memory Analysis Tool), I proved that this was still working as expected.



What I hadn't previously accounted for was the QueryKey.sqlQueryString field.  Since we use a lot of natural id query cache optimizations, we have tens of thousands (and in some cases, well over 100,000) copies of identical queries tucked away in QueryKey as the sqlQueryString (thus, differentiated only by the query parameters).  And since hibernate generated SQL is not exactly terse, we have a nice formula for memory explosion.

Different Tune, Same Dance


We're already using my decorated query cache which eliminates duplicates in the parameters.  So I decide to modify it to also swap out the sqlQueryString for a canonical representation.  One caveat is that sqlQueryString is private and final.  Lo and behold, you can modify private final fields with reflection in Java 1.5!  Granted, you could really do some silly stuff and fake out the JVM and screw up compiler optimizations, but we're only replacing the field with another String that should be functionally equivalent, so hopefully any 'weirdness' is mitigated.  Again, for licensing reasons I won't paste the whole query cache decorator.  Creating the full one (and the factory to instantiate it) is left as an exercise to the reader.



The modified query cache decorator looks like this:

 
private final Map<Object, Object> canonicalObjects = new HashMap<Object, Object>();

 
public boolean put(QueryKey key, Type[] returnTypes,
@SuppressWarnings("unchecked") List result, boolean isNaturalKeyLookup,
SessionImplementor session) throws HibernateException {

// duplicate natural key shortcut for space and time efficiency
if (isNaturalKeyLookup && result.isEmpty()) {
return false;
}

canonicalizeKey(key);

return queryCache.put(key, returnTypes, result, isNaturalKeyLookup,
session);
}

private void canonicalizeKey(QueryKey key) {
try {
synchronized (canonicalObjects) {
canonicalizeParamValues(key);
canonicalizeQueryString(key);
}
} catch (Exception e) {
throw Exceptions.toRuntime(e);
}
}

private void canonicalizeParamValues(QueryKey key)
throws NoSuchFieldException, IllegalAccessException {
final Field valuesField;
valuesField = key.getClass().getDeclaredField("values");
valuesField.setAccessible(true);
final Object[] values = (Object[]) valuesField.get(key);
canonicalizeValues(values);
}

private void canonicalizeQueryString(QueryKey key)
throws NoSuchFieldException, IllegalAccessException {
final Field sqlQueryString;
sqlQueryString = key.getClass().getDeclaredField("sqlQueryString");
sqlQueryString.setAccessible(true);
Object sql = sqlQueryString.get(key);
Object co = ensureCanonicalObject(sql);
if (co != sql) {
sqlQueryString.set(key, co);
}
}

private void canonicalizeValues(Object[] values) {
for (int i = 0; i < values.length; i++) {
Object object = values[i];
Object co = ensureCanonicalObject(object);
values[i] = co;
}
}

// assumes canonicalObjects is locked. TODO: consider a concurrent hash
// map and putIfAbsent().
private Object ensureCanonicalObject(Object object) {
Object co = canonicalObjects.get(object);
if (co == null) {
co = object;
canonicalObjects.put(object, co);
} else if (co != object) {
// System.out.println("using pre-existing canonical object "
// + co);
}
return co;
}



As you can see, we simply change the sqlQueryString via reflection just like we do the param values.

Your Mileage May Vary

How much of a win this is for you depends on your use case.  As I said, since we heavily use the natural id query optimization, we had tons of repeated sql strings.  So, exactly how much memory you 'reclaim' in this fashion totally depends on the variety and uniqueness of the queries you run through the hibernate query cache.

Bonus: L2 Cache Reduction


While we're at it, I noticed a few other things that seemed odd in the L2 cache.  There was more HashMap overhead than there was contents in the cache itself.  I poked around the source a bit and saw that every entry in the L2 cache was being turned into a property->value map before it was stored in your L2 cache provider (and the inverse process occurs on the way out).  This seemed odd to me, as we already have a decomposed CacheEntry object which is an array of the Serializable[] properties from your persistent entity.  Why create another (less efficient) representation as well as introduce unnecessary conversions?  After some google-fu, I realized you can bypass this conversion by setting hibernate.cache.use_structured_entries to false in your hibernate configuration.



Any docs I found on hibernate.cache.use_structured_entries merely seemed to suggest that it stores the properties in a more 'human friendly format.'  And, who wouldn't want that?  And, all examples we built on when first starting with hibernate turned it on, so... so did we.  What they don't mention is what it actually does, and the penalty you pay for doing so -- which is apparently too much HashMap overhead for what should be pretty simple in memory storage.



However, be aware -- this only works for non-clustered, in-memory use of L2 cache.  Apparently, if you cluster your JVMs and need L2 cache clustering, the in/out of the property order cannot be guaranteed between JVMs.  Thus, you have to use structured entities in the cache so they can be re-hydrated properly by property name.

Right-Size Your Caches


We moved to the 1.6 release of EH Cache, so... this may only apply to that version.  But, I noticed that whatever you configure as 'max elements' for ehcache, it uses as the 'min capacity' for the ConcurrentHashMap.  Needless to say, if you size them for your potential largest usage, but then deploy to a smaller-usage environment, you can end up with some waste in the overhead for the hash tables.  It didn't seem terribly significant in my case, but it did show up on the radar.

Wrap-Up


Even in today's multi-gigabyte server (heck, even desktop...) environments, memory is still a precious resource.  Even if you have lots of it available, wouldn't you want to make more use of that memory instead of having it wasted?  Freeing up wasted memory means there is more for the 'transient' objects that come and go quickly.  There's less pressure to garbage collect and try to find available heap.  And, there's more memory available to do smart things with useful caching. In short:



  • Use a query cache decorator to reduce parameter and SQL string duplication.

  • If you are in a single JVM using in memory cache only, use hibernate.cache.use_structured_entries=false in your hibernate configuration.

  • Right-size your caches to reduce hash table overhead.