Wednesday, June 18, 2008

Hibernate: Cache Queries the Natural Id Way

I work on the MySQL Enterprise Tools team, formerly of MySQL and now with Sun Microsystems. The 2.0 version of the Enterprise Monitor is well under way. As part of this, the Java server backend has been refactored to utilize Spring and Hibernate. Honestly, I didn't know either one of those technologies before starting this project. Oh, what a fun road it has been...

A big draw for using an off-the-shelf ORM was so that we didn't have to write our own (kind of bad and slightly wrong -- those darn transactions) caching implementations for the custom one-off ORM that existed previously. A lot of our internal meta-model is very static, so clearly caching would be a HUGE win for performance, right?

Not so fast, turbo. Let me continue...

The headline feature for the 2.0 Monitor is "Query Analysis." Coupled with the MySQL Proxy, the Monitor receives captured query data to/from a MySQL server. Once at the monitor, the data can be aggregated, analyzed, and reported upon. What better test for this feature than to use it on ourselves, to tune ourselves!

And this brings me back to hibernate caching. In the course of monitoring ourselves, I noticed that a certain query was happening WAY more than it should be, just based on my gut feeling. The query in question loaded an object that was generally static -- save for one value that represents the frequency of how often some data should be collected. Its the only mutable value, and once in place, it rarely changes.

Hrm... how to debug. First, we checked the cache settings. Whoops -- WAY too low for both the cache expiry timeout and the max cache elements. Fix that. Still sucks. Some cursory hibernate source and log perusal showed that the cache for these objects was being invalidated at a rapid rate. Yes, the entire cache. Even though the objects are essentially static, the query cache takes the safe route and says any change to a related table invalidates any and all queries that reference that table. This makes sense, from a generic cache standpoint. But I thought to myself -- surely there has to be a way. *I* am smarter than hibernate in this case, and *I* can more rightly determine when the query results should be invalidated. Lucky for me, hibernate allows you to extend the StandardQueryCache "up to date" policy checks. w00t. I implemented one, overrode the timeout policy for the object(s) in question, and re-ran tests. FAILURE. Turns out I am not smarter than hibernate.

However, in the process of implementing the custom query cache policy, I had debugged through some more hibernate code and noticed that "natural id" queries are treated special. Some more google-fu, and quickly I come across Queries by Natural Identifier in the hibernate docs.

Now, the docs just aren't real clear on what optimizations can be made internally by utilizing the Criteria with a natural id restriction. But, as I was just in that section of code, I could correlate it. Here's the meaty bit -- if you make a natural id / key lookup, and hibernate recognizes it as such, it can bypass the table timestamp invalidation and go directly to second-level cache to fetch the object. Hibernate knows, with an immutable and unique natural key, that a table modification will not effect the composition of the object in question (of course, an object modification would, and it would have been evicted from L2 cache).

I cannot overemphasize the utility of this discovery. You see, we were making frequent inserts into the table. But existing objects (rows) where changed almost never. But without the natural key lookup, the inserts invalidated all results in the query cache. There you have why I was seeing way more selects for the same objects than I had anticipated.

Some quick assurances that we mapped the natural id correctly, some quick refactoring of the HQL into Criteria queries with natural id restrictions, and whammo, we're good. Lets run the tests and query analysis again... ruh roh. OH COME ON! (not my exact reaction, but I think you can guess what it was really like).

Confident that the natural id cache lookup optimization was what I really really really wanted, there had to be something else going on. More debuggage ensued. I set a breakpoint near the same area in StandardQueryCache where I first noticed the query cache optimization in the first place. Lo and behold, the hibernate metadata for saying "i am a natural key lookup" was returning false.

I am not amused. I am confident my hibernate mapping is correct, because the unique index was present in the schema. Think. Think. Think. Well, I had recently been on an effort to move from the hbm xml mappings to hibernate annotations mappings. @NaturalId support was, in fact, the very reason I had recently upgraded the annotations jar. On a hunch, I reverted the persistence mapping back to the xml form. Debug, the metadata returns the correct value... test, and YES, finally -- the queries issued are in line with my expectations and the rows present in the database. I. Have. Won.

Being the good open-source citizen, I made a hibernate forum post that detailed my findings, including simplified sample code demonstrating the problem. The good folks on the hibernate forum (after questioning the sillyness of my contrived example) were quick to recognize the problem, and I got a hibernate jira issue opened.

The workaround, obviously, is leave the xml mapping in place until the fix makes it into a hibernate release. Not too bad of a deal, I guess, considering the overall win I now have in my cache hit ratio.

In conclusion -- if it makes sense for your data model, the natural id query cache optimization can be a huge performance win for your app. If you have immutable, or rarely changed objects with a constant natural key lookup -- look into the Criteria natural id restriction. And, use the xml mapping until the bug is fixed.

ps -- there is one other performance note to consider, actually. If using the natural id query, and it returns no rows, this NULL result will not be cached. So, if you have more of these than 'object/row found' results, you will still get tons of these queries that you don't expect. Either stop using the natural key optimization (if 'not found' is more common), or extend your object/schema to include a 'not supported' field. In our case, the lack of a row meant "not supported" and we had a flag "not supported" in case it was supported, but then went away. In those cases where something was frequently "not supported" I simply went ahead and created the object/row and just set the flag to false -- thus ensuring the natural key optimization was not subverted.


4 comments:

Hardy Ferentschik said...

Bug in Hibernate Annotations is fixed now. Haven't traced the code all the way into the caching part, but the meta data is correct now. I am intrigued to hear if this will solve your problem.

Oldag said...

I pulled the latest annotations src from their trunk, built, and ran with it. I'm pleased to say the @NaturalId mapping is fixed, and the query cache optimization works as expected.

You can read more about it back in the hibernate forum thread linked to above.

thanks again, Hibernate team!

Darin Keever said...

So.. if you write a long enough blog, someone fixes something? Guess I need to start blogging.. capriciously.

Irfan Shaik said...

Hi,

We are facing a problem with memcache. We are using hibernate-memcached-version1.5-SNAPSHOT and spymemcached-provider-version3.0.2.

The following are the configuration

persistence.xml
---------------

!-- Enable 2nd Level Cache --

property name="hibernate.cache.use_second_level_cache" value="true"
property name="hibernate.cache.use_query_cache" value="true"

!-- MemCache Configuration --

property name="hibernate.cache.region.factory_class" value="com.googlecode.hibernate.memcached.MemcachedRegionFactory"
property name="hibernate.cache.use_minimal_puts" value="true"
property name="hibernate.cache.use_structured_entries" value="true"
property name="hibernate.memcached.servers" value="${hibernate.memcached.server.host.port}"
property name="hibernate.memcached.cacheTimeSeconds" value="1800"

dto
----

@Cacheable(true)
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)

In GenericDao we are setting query hints cacheable to true for loadAll().

We are using loadAll() method to fetch all the records.
Whenever we made a request the loadAll query is executing and also the queries based on id are executing.

when i refer to the log am able to notice that the data is fetching from database and setting in memcache for a request and when we make another request instead of fetching the data from the memcache it is hitting again to the db and again setting to memcache. Am unable to understand that without modifying any data why it is hitting to db?

Please let me know we are missing anything.